• Home
  • Blog
  • The 5 Pillars of the AWS Well-Architected Framework: II – Security

TECHNOLOGIES, TRENDS

19.08.2020 - Read in 14 min.

The 5 Pillars of the AWS Well-Architected Framework: II – Security

19.08.2020 - Read in 14 min.

We encourage you to read the second article from “The 5 Pillars of the AWS Well-Architected Framework” series. It has been based on our cooperation with our clients and on AWS best practices. This time we take a closer look at Security, one of the foundations of the infrastructure our solutions are based upon.

RST Software Blog - The 5 Pillars of the AWS Well-Architected Framework: II - Security

The first part of The 5 Pillars of the AWS Well-Architected Framework focused on the first pillar, i.e. Operational Excellence. Today, it’s time to take a closer look at the second pillar, i.e. Security.

 

According to AWS documentation, a list of pillars the architecture of your projects can be based on may look like this:

  • Operational Excellence
  • Security
  • Reliability
  • Performance Efficiency
  • Cost Optimization

This list can be used as a sort of foundation for the infrastructure you will build your services upon. Following these guidelines (but treating them as suggestions, of course) will give you a form that’s stable, secure, and efficient both functionally and financially.

 

This chapter is about: Security

Here is the official description of this pillar:

„The security pillar encompasses the ability to protect data, systems, and assets to take advantage of cloud technologies to improve your security”

 

In short, you need to ensure the highest possible security for the system and services connected with it, while taking the best possible advantage of the Cloud.

 

Design Principles

Here are the main principles of this pillar:

  • Implement a strong identity foundations

    One of the basic security rules is “the least privileges”, i.e. give the Users, Services, etc. access rights only to those components or functionalities of the system that they actually need, and nothing more. I believe you should avoid giving access on the “*” basis, as this clearly shows that you don’t necessarily understand how to control your resources. Exception from this rule is a situation when you want to see access requests sent by a given service. If you are not sure what it may need access to, give it a broad area for manoeuvre. Then, monitor it in AWS CloudTrail, which will allow you to pick and validate only the ones you think are justifiable.

  • Enable traceability

    By running monitoring, alarming or auditing services, you gain a lot of information on what actually happens in your system. It is a very good practice to have a separate AWS account for logs and audits. Then, you can use AWS Organizations to get another level of security by separating information from access rights. It’s easier to manage a single account and its access rights only to particular functions like AWS CloudWatch Logs than to allow the auditor to use the main production account to see “what’s what”. This mitigates the risk of leakage of important data the auditor needs access to.

  • Apply security at all layers

    An obvious practice is to ensure security at every stage of system building and in as many of its parts as possible. You should never stop thinking about security and you should never assume that this can be done at the end of the project. Don’t count on some “Security Officer” to verify your system and give recommendations on what needs to be corrected, as it may already be too late for that.
    Your project keeps evolving, new components are added, existing ones are modified, and sometimes even a Beta version is available for the end client or bystanders at your company. Waiting for “the right time to secure the effects of your work” may be quite destructive, as it may require a lot of work. What’s even worse, some data may already be compromised due to an excessively easy access. At this stage, safeguards tend to be treated as something non-essential or as an element of the project that generates costs, as the priority is to build new features. This may lead to a situation when it is yet again put on the back burner, and you only return and think about it after an attack, breach, data theft, etc. Nobody wants this kind of embarrassment. Therefore, if you continuously think about security, ensuring it while working on a project does not take that much.
    Remember to give reasonable access rights to users and services, and to use encryption as often as possible (at Rest and in Transit).

  • Automate security best practices

    Similarly to individual components of your system, safeguards should be implemented using code or automated processes. This gives you a sense of stability and repeatability of implementation in components. An important part is to test your safeguards using e.g. IAM Policy Simulator, which allows you to see how your IAM Policy actually works. Alternatively, you can use Penetration Tests available for selected services in AWS, without submitting a request to AWS Support. Supported services include e.g. EC2, RDS, Api Gateway or Lambda. However, you cannot perform Port Flooding or Request Flooding attacks. DDoS attacks or Stress Tests can be performed after obtaining an approval from AWS.

  • Protect data in transit an at rest

    As I have mentioned before, according to the “apply security at all layers” rule, data should be protected in transit between locations and while resting in a target location. Encrypting data transmission means more work for developers, who need to implement appropriate mechanisms for that. However, in systems where you cannot allow for any data leakage, this may even be required. Target location can be protected by implementing AWS KMS for e.g. S3 or by using AWS EBS disk encryption. Some AWS components, like DynamoDB or RDS, offer this as standard and you can change the encryption type to “managed by AWS” (AWS owned CMK), “managed by user” (AWS managed KMS) or “with own private key” (Customer managed CMK).

  • Prepare for security events

    An important part of the implementation of safeguards is to conduct live tests. Hackathon or Chaos Monkey Tests are perfect events for this, where groups of people or automated solutions cause failures of certain components or even entire environments. During such events, not only the implemented safeguards, but also the implementation of monitoring or event logging systems can be tested. This will not only let you see where security vulnerabilities are, but you will also know whether you can quickly detect and eliminate them.

 

RST Software Blog - Security

Best Practices

AWS defines a series of good practices to be implemented during the deployment of security mechanisms in your system. They are:

  • Security
  • Identity and Access Management
  • Detection
  • Infrastructure Protection
  • Data Protection
  • Incident Response

 

Security

In order to ensure security of your system, you first need to understand… the security and be aware of possible threats. When you begin building a new system and use the practices listed in the previous article regarding Operational Excellence, you need to be prepared for threats right from the beginning. You can begin your journey by following the recommendations from AWS or OWASP.

 

Identity and Access Management

The AWS Identity Access Management (AWS IAM) mechanism is a basic element of access control in the AWS Cloud. It lets you create IAM Policies, containing access definitions for particular resources. Thanks to that, you define which components can or cannot use a given AWS component. IAM Policies are then assigned to IAM roles, allowing you to combine specific Policies into one larger resource. IAM Roles may be ultimately assigned to IAM Groups, IAM Users or even to services. Obviously, you have to remember “the least access” rule and give appropriate access only when necessary.

At the beginning of a project for one of our customers, the initial step when it comes to AWS IAM was to create groups for users and services. These were standard groups, i.e. Admins, Developers, Testers, Managers, etc. Each group had an appropriate IAM Policy to give access to particular resources. The approach to access was rather unique, as it was based on a definition of an IAM Role that could be assumed by a given group. This was the result of the following access model:

  • each environment (TEST, STG, PROD, etc.) is placed on a separate AWS account
  • the accounts are connected under a single AWS Organization
  • the account with user groups and service groups is kept on a dedicated AWS Management (MGMT) account
  • connecting a user or a service with a resource e.g. in STG environment is realised with the use of AWS Assume Role and AWS Security Token Service

 

As you can see, this is a rather simple model, however to access e.g. AWS EC2 or RDS, some additional steps have to be performed, as follows:

  • using the standard access_key and access_secret_key to get the so-called SessionToken
  • using new access credentials to connect with a remote account
        
# aws sts assume-role --role-arn "arn:aws:iam:: 543673423124:role/OrganizationAccountAccessRole" --role-session-name "foo_access"
        
    
        
{
    "Credentials": {
        "AccessKeyId": "SECRET_DATA",
        "SecretAccessKey": "SECRET_DATA",
        "SessionToken": "SECRET_DATA",
        "Expiration": "2020-08-03T07:49:22+00:00"
    },
    "AssumedRoleUser": {
        "AssumedRoleId": "SECRET_DATA:foo_access",
        "Arn": "arn:aws:sts::543673423124:assumed-role/OrganizationAccountAccessRole/foo_access"
    }
}

        
    

This means you can control the users from a single place, whereas access rights are controlled individually for each AWS account. Later, you can monitor all of that using AWS CloudTrail. Obviously, each new user account has basic safeguards requiring the use of long string of characters as passwords, or the Multi Factor Authentication (MFA) to sign in to an AWS account. MFA should also be used for accessing repositories like e.g. gitlab, and the option of deleting or even clicking Merge for the code should be limited to a selected group of users of the repository.

 

Detection

All types of activity related to access should be monitored. This can be achieved with AWS CloudTrail.

RST Software Blog - Manual Security Group Modification

Manual Security Group modification

 

The above example shows a visible user’s manual interference with AWS Security Group. In my opinion, this search method is not convenient enough. A better solution would be, for example, using SQL. For this purpose, you can use AWS Athena, a service that will read data from AWS S3.

RST Software Blog - AWS Athena pipeline

AWS Athena pipeline

 

The next step, as previously mentioned, is gathering data from all accounts into a single joint account that can be later used by an auditor, should they decide to look at your logs. The image below illustrates this idea. The configuration is rather simple. Its main steps are:

  • creating an AWS S3 resource on a LOGS account
  • enabling logging from another AWS account with the right entry in Bucket Access Policy
  • configuring the AWS CloudTrail service on DEV or PROD accounts so that it writes to the aforementioned S3

 

RST Software Blog - AWS Transit Gateway connection

AWS Multi-Account logs

If you have connected accounts in AWS Organization, you don’t even need to create a Bucket Access Policy—the Organization service handles all that.

Changes in the infrastructure can also be made more visible with AWS Config. This tool provides information about the configurations of AWS components and their relations. Moreover, it lets you analyse potential security vulnerabilities by for example:

  • checking if a base is encrypted
  • checking if S3 Buckets are private
  • checking if VPC Flow Logs are enabled
  • checking if all accounts comply with the same security rules

However, when infrastructure and its structure is concerned, describing it with code and adding changes to the pipeline as a GitOps methodology gives you an even greater insight into its inner workings and allows you to check for security concerns like having no encryption in an RDS PostgreSQL database. Atlantis, a tool that works with Terraform, is a great choice for this. It lets you add automated solutions that will oversee the process of implementing changes in the environment. Tests based on Open Policy Agent combined with Terraform also provide a lot of information about attempts to implement incorrect changes in AWS IAM.

 

RST Software Blog - Founded vulnerabilities

Founded vulnerabilities

 

Another interesting solution, introduced in AWS Elastic Container Registry (ECR), is scanning the content of Docker images. This gives you information about “founded” vulnerabilities. However, sometimes new images don’t have a verification model. In those cases, you don’t get information about known vulnerabilities.

 

RST Software Blog - Unsupported Image

Unsupported Image

 

The Event-Driven Security approach is great for detecting all events related to the broadly understood security. One example is receiving information about events from AWS CloudWatch and sending them to Lambda or AWS Simple Notification Service (SNS) to send email notifications.

RST Software BLog AWS CLoudWatch Events

AWS CloudWatch events

 

The above method can work for detection scenarios and restoring access changes in S3 Bucket. When an Access Policy change in S3 event occurs in AWS CloudTrail, AWS CloudWatch Event detects it and triggers Lambda which then restores access rights as required. In the meantime, an email notification about the event is sent to the Administrator.

Not every anomaly can be detected with AWS CloudTrail or CloudWatch. If a user enters an incorrect password during a login attempt, it’s considered normal. However, if that user enters an incorrect password hundreds of times, it raises a red flag: the attempts could be a Brute Force attack. AWS GuardDuty is an excellent tool for monitoring such behaviour. It monitors all kinds of logs, including Access Logs for VPC, NAT Gateway, Load Balancer, or S3. It can also work with AWS CloudWatch, which registers events and sends out email notifications.

 

RST Software Blog - AWS GuardDuty

AWS GuardDuty

Infrastructure Protection

Infrastructure protection is subject to additional security system implementation restrictions. In this case, with AWS, we use AWS VPC, where we can define Network Access Control Lists or Security Groups, and the connection to On-Premise should use Site-to-Site VPN. Infrastructure security is bolstered by the Shared Responsibility Model that shows which parts of the system are managed by Amazon Cloud, and which are managed by us. Cloud Engineer doesn’t have to worry about issues in the Xen hypervisor because those are managed by AWS, however he/she should know how to correctly configure an EC2 Linux instance, ensure its security and monitor possible Common Vulnerabilities and Exposures (CVE).

 

RST Sofware Blog AWS Shared Responsibility Model

AWS Shared Responsibility Model

AWS also has a list of potential vulnerabilities in AWS AMI Linux (v1 and v2), available in the Amazon Linux Security Center.

One of the jobs I recently worked on was the development of a mechanism for creating up-to-date AMI images, so-called Golden Images. The main goal was to make the start-up of Linux on an EC2 instance configured with Puppet CM faster. However, it could still be expanded with the capability to continuously update the kernel, packages, or configuration. In this way, there would always be a current and stable operating system version with its components in the repository. The AMI development process was automated: required changes and specific component updates like Java or mysql were detected. Afterwards, an image creation process was launched in a virtual machine with Packer. Following successful update, the artifact was sent to AWS S3 and AWS AMI Linux was created.

You need to pay attention to network security within Availability Zone. The networks should be split into at least two groups: Public and Private. An example of this is the Two Tier Model where HTTP servers can be located in a Public network with direct access from the Internet (although I would not recommend this since queries should at least go through a Load Balancer first), and database servers are usually in a Private network with optional output into the Internet via a NAT Gateway.
Connecting numerous VPCs, especially when you have several AWS accounts (like in our project), should be done via AWS Transit Gateway rather than VPC Peering. Transit Gateway acts as a connection hub that, in a way, protects the network inside VPC from a direct attack vector like SQL Injection or Cross-Site Scripting. As previously mentioned, the project realised for one of our clients had this component. It is also used to connect with On-Premise via VPN. Routing was configured in a way that disabled moving from the TEST environment to PROD and from PROD to TEST. Access separation was also ensured within the TCP/IP network, not just via AWS IAM.

RST Software Blog - AWS Transit Gateway connection

AWS Transit Gateway connection

 

The example above is logical, but it has one minor flaw: it’s hard to configure using Terraform. The Transit Gateway in the example above must be made available from an MGMT account to other accounts via AWS Resource Access Manager (RAM) and accepted there. Switching between AWS accounts in Terraform is a bit of a hassle. We managed to automate everything properly, but it required a considerable amount of work and time.

Another infrastructure protection method is using AWS Cloud Front as a component for AWS S3 resources. Using Origin Access Identity (OAI) enables safe access to objects that must be made public when you don’t want to grant direct access to the S3. OAI acts like a “user” who’s listed in the Bucket Access Policy and can read objects in the S3.
This approach worked great in one of the projects where an app’s frontend had to make S3 resources publicly available. However, granting free access to those resources was out of the question. OAI has met our project requirements and security requirements.

Data Protection

When it comes to data protection, you should begin your work from data classification, so that you know which data has to be protected and which can be considered less significant. You can start classification by tagging resources where data will be stored. Access to those resources can be restricted with Attribute Based Access Control (ABAC) where the aforementioned tags play a major role.

The next step is defining how to protect the data in transit and at rest in specific resources like S3, EBS or RDS. Consider forced encryption of data in transit due to implementation issues in app code. Nonetheless, it is worth the time. To encrypt data at rest, you should use strong AWS KMS keys managed by AWS or by yourself.

 

Incident Response

Attacks can happen even in a perfectly configured environment with the highest security levels. In such situations, you have to remain calm and composed. The attack has already happened, so the damage is done. Now, you need to react quickly to mitigate the impact and possibly block the attacker (the latter is not recommended in certain situations). It’s good to collect all the information on the problem and start writing it down so that it can be accessed by other people. It is crucial to keep the events in chronological order, i.e. which actions were taken by the response team at a given moment or what unpredicted events happened at a given time. You can deepen your knowledge on the issue by analysing logs in AWS CloudWatch or AWS Elasticsearch / Kibana. Communication between team members should be fluent and focused on the event. Excessive communication noise slows down the response. Carefully select the people who will have access to information about the attack. Sometimes, informing too many people about an attack can negatively impact how you defend against it. This is related to the fact that many attackers actually come from inside the targeted companies. As a result, including too many people in the loop—especially the upper management—can negatively impact your decisions.

Every attack ends eventually, and you need to treat it as a great learning opportunity. Post Mortem is something that every company should do. The process should involve people who participated in repelling the attack as well as those closer to the business, so that information reaches the business as well. During the Post Mortem meeting, everyone should take another look at the documentation created during the defence against the attack and try to identify parts of the system that should be better secured or otherwise improved. The improvements should be prioritized accordingly: implement the most important changes first.

After you’ve learned from the whole experience, don’t just rest on your laurels. Instead, start attacking your own system. Tests like Chaos Monkey and Game Days are the best way to uncover vulnerabilities of your system. During those tests, you will learn a lot about the infrastructure itself as well as the level of security. Tests begin in a TEST environment where you can identify the majority of issues. Their fixes are of course implemented in other environments like Stage (STG) and Production (PROD). Ideally, you want to have a system where the above-mentioned tests can be performed in the environment that is the closest to the End Client — Production. However, in this case you need a high level of automation, many High Availability (HA) components, and properly created Disaster Recovery (DR) mechanisms.

 

RST Software Product scaling by cloud solutions

 

Summary

This text attempts to describe, more or less, what a Security Pillar compliance approach should look like. You won’t always be able to implement all of the solutions suggested by AWS, but coming closer to those guidelines is indicative of broad understanding of the issue of security as well as better system stability. Remember that “security is not a goal—it’s a journey”. You should implement security at every step of the system building process, not just at the very end of it.

 

We encourage you to read part 3:

“The 5 Pillars of the AWS Well-Architected Framework: III – Reliability”.

Article notes

Udostępnij

RST Software Masters

Kamil Herbik

AWS Cloud Engineer

Experienced Cloud Engineer who loves DevOps, new technologies and building new things. When involved in a project, he is not afraid to use cutting-edge technologies and open new paths of its development. Few years ago, he was one of the persons who initiated the DevOps transformation at RST, which has had a significant influence on the style of work at the company. He does not believe in the sentence "it cannot be done", and he always tries to convince himself and the others that everything is possible. In a free time he performs bouldering and spends time with his family.

Thank you!

Your email has been sent.

Our website uses cookies to work correctly. Using this website with current settings means that cookies will be stored in the browser’s memory. Cookies settings can be changed in the browser’s options. For more information please visit Cookies policy.