Summary – Determining Security Requirements and Controls – SAP-C02 Study Guide

Summary

This chapter has covered quite a lot of ground in terms of designing secure solutions on AWS.

You learned how to leverage IAM and identity federation in your solution to provide granular access control. You then looked at the best practices to protect your infrastructure resources—using tools such as AWS WAF, AWS Shield, and AWS Firewall Manager—and your data using encryption at rest with AWS KMS and enforcing encryption in transit. The chapter then concluded with a discussion on incident detection and response to prepare for worst-case scenarios, leveraging tools such as AWS CloudTrail, AWS Config, Amazon GuardDuty, AWS Security Hub, and Amazon EventBridge.

In Chapter 6, Meeting Reliability Requirements, we will dive into the best practices for designing and implementing reliable solutions on AWS.

Further Reading

6

Meeting Reliability Requirements

This chapter will focus on determining a solution design and implementation strategy to meet reliability requirements. You will explore several architecture patterns and architectural best practices for designing and implementing reliable workloads on AWS.

Designing and implementing solutions with resilient architecture is essential to recover easily and successfully in case of failure. You will look at the following topics:

  • Reliability design principles
  • Foundational requirements
  • Designing for failure
  • Change management
  • Failure management

Reliability Design Principles

Reliability refers to the ability of a system to function repeatedly and consistently as expected. As you can imagine from that definition, it can mean totally different things depending on the system at hand. Ensuring the reliability of a nightly batch application running on weekdays will be something very different from ensuring the reliability of an application serving requests 24/7.

The reliability pillar of the AWS Well-Architected Framework comprises five design principles to keep in mind when designing a workload for reliability in the cloud.

Principle 1 – Automatically Recover from Failure

Everything will eventually fail over time,” said Werner Vogels, the CTO of Amazon. You can’t expect to have humans constantly watching the vital signals, also known as key performance indicators (KPIs), of each workload you deploy in the cloud and taking action whenever something goes wrong. Although you may need to rely on human assistance in some very specific cases, it is neither scalable nor sustainable. Here, automation is key.

The idea is to monitor the KPIs of your workloads and trigger any necessary processing when one or more thresholds are breached. You may wonder which KPIs you should monitor then. Well, it depends. What is important, however, is to make sure the monitored KPIs reflect the business value of the workload and not technical operational aspects; thus, depending on what is important for your business, you will be watching different things. For instance, if speed is essential, you might be monitoring the number of tasks or requests processed over time, whereas if quality is paramount, it might be more important to monitor the number of requests returning errors or timing out.

Once you have defined the relevant KPIs for your case, it’s a matter of taking action when thresholds are breached: sending notifications, tracking failures, and triggering recovery processes to work around or repair the failure(s). The more sophisticated the automation, the more prepared you are to even anticipate failures before they occur.