First, keep people away whenever it is feasible. End users should consume the data as much as possible through an interface of some sort, such as a custom user interface (UI), a custom API, or another AWS service UI or API. Allowing access to the data directly where it is stored should be the exception, not the rule.
Second, make sure that only authorized people access your data. Based on the data classification you established earlier, on a need-to-view or need-to-edit basis, you should establish fine-grained access control to your data following the least-privilege principle. It means, on the one hand, managing, authenticating, and authorizing the end users and, on the other hand, filtering access to data, using, for instance, row-level or column-level access filtering.
Third you need to take particular care of any sensitive, confidential, and PII data. For such data, you may want to put additional measures on top of encryption. In many cases, even the end users who are authorized to access the data may not need to view the actual data and access to fictitious data may be good enough for the tasks they need to perform. Think of PII data. For such sensitive information, you can leverage some extra protection such as data tokenization, which consists of masking the actual data that is then replaced by some similar but fictitious data. The tokenization process can take place upfront (tokenized data is preprocessed and stored along with the rest of the data) or on the fly (for instance, leveraging Amazon S3 Object Lambda when the data is stored on S3). The tokenization mechanism to be used depends mostly on your use case and whether you need frequent access to the same data (in which case, on-the-fly tokenization may not be the most efficient approach) or you need the tokenization process to be reversible so you can retrieve the original data from the tokenized data (in which case, a simple hash function won’t work).
Finally, you also want to regularly audit your data access logs, from AWS CloudTrail or other logs, such as S3 access logs. We will cover this in more detail in the Detecting Incidents section of this chapter.
As you can now imagine, protecting your data at rest is not enough—you also need to ensure its protection when it is in transit. When data needs to be provided to end users or exchanged between applications or services in the context of your solution, you are responsible for protecting its integrity and confidentiality.
First, use secure protocols, such as TLS, that enforce end-to-end (E2E) encryption whenever your data’s integrity and confidentiality are at stake while it is being transported from one system to another. AWS services provide HTTPS endpoints using TLS to encrypt all communications.
Second, manage the life cycle of your TLS certificates and limit access to the bare minimum. Prefer a managed service such as AWS Certificate Manager (ACM) to automate and delegate most maintenance tasks. ACM can be used to maintain both your public and private certificates and it integrates natively with AWS services such as AWS ELB, Amazon CloudFront, or AWS API Gateway, handling automatic certificate renewal for the resources they protect.
Third, enforce encryption in transit by blocking unsafe protocols, such as HTTP, using, for instance, security groups to protect your resources. In this case, you can also force a redirect of HTTP to HTTPS with Amazon CloudFront or an ALB sitting in front of your application.
Last but not least, you also want to regularly audit your data access logs from AWS CloudTrail or other logs such as VPC flow logs. We will cover this in more detail in the Detecting incidents section of this chapter.
We have now covered different approaches and various AWS services that you can use to protect your infrastructure and your data. No matter how secure you think your solution is, you should nevertheless prepare for the worst-case scenario and be able to answer this question: How will the solution behave in case of a security incident? The next section discusses this.