Using Fault Isolation to Protect Your Data – Determining Security Requirements and Controls – SAP-C02 Study Guide

Using Fault Isolation to Protect Your Data

Fault isolation limits the impact a failure can have on a workload to a limited set of components. This is a similar effect to the blast radius limitation that you want to achieve in terms of security. The idea is always the same: the components located outside of the boundary remain unaffected by the failure. Therefore, it is good practice to create multiple fault-isolated boundaries to reduce the impact of a failure on your workload.

Deploying the Workload to Multiple Locations

Consider a brief recap of how the AWS infrastructure is structured.

At the top, you find the AWS Regions. Regions are geographical locations where data centers are clustered (for instance, Dublin, Ireland, or Sydney, Australia). Each Region is composed of multiple, at least three, AZs. Each AZ consists of one or more data centers and has redundant power and connectivity within a region. AZs are located several kilometers apart, but less than 100 kilometers. They are interconnected via high-throughput, low-latency networking, over redundant fiber links.

Next, you have Local Zones, which are similar to AZs. They can indeed be used as a zonal placement for zonal AWS resources, such as subnets or EC2 instances. However, they are not directly located in the associated AWS Region, but near large industry or IT centers where no AWS Region is present (for instance, Los Angeles, CA, USA). That said, they are still capable of ensuring high-bandwidth, secure connectivity between resources in the Local Zone and resources running in the AWS Region. Local Zones are useful to manage workloads closer to your users for super low-latency requirements.

Last but not least, you find the Amazon Global Edge Network, which consists of edge locations in multiple cities around the world. With over 300 edge locations across the globe, the purpose of this network is to provide access to AWS resources and the AWS network closest to the end user location. Amazon CloudFront, for instance, uses this network to deliver content to end users with lower latency. Several other AWS services and features, such as, for instance, AWS Global Accelerator or Amazon S3 Transfer Acceleration, leverage the edge network.

For more details, please consult the AWS infrastructure web page at https://packt.link/hROPk.

A very strong suit of the AWS infrastructure, as it is built with redundancy at every layer, is avoiding a single point of failure. Naturally, you also want to avoid having single points of failure in your own workload. So, the first recommendation is to distribute your workload resources at least across multiple AZs.

So, given the properties of the AWS infrastructure, by distributing your resources across multiple AZs, they automatically benefit from strong protection against power outages or disasters such as fires, lightning strikes, floods, or earthquakes.

Some AWS services, for instance, Amazon EC2, are strictly zonal, and when using such a service, your resources share the fate of the AZ they are in. However, resources of the same service running in a different AZ within the same region will not be affected by a failure impacting only the first AZ. On the other hand, some AWS services are regional, such as Amazon DynamoDB, and use multiple AZs in an active/active configuration out of the box. It lets you achieve your availability design goals without having to define the multi-AZ configuration yourself. It is absolutely key to know whether a given service is regional or zonal since this can strongly influence the design of your workload to ensure it meets its reliability requirements.

Note that some services offer APIs that allow you to specify the regional or zonal scope of the request. When you can reduce the scope of a request to a single AZ, the request is processed only in the specified AZ, not only reducing the exposure to disrupt resources in other AZs but also avoiding being disrupted by an event in another AZ. The following AWS CLI example illustrates how to extract some information about Amazon EC2 instances from the eu-west-1a AZ only:

aws ec2 describe-instances –filters Name=availability-zone,Values=eu-west-1a

Now, you may wonder whether it may be necessary to go a step further and distribute your resources in multiple regions to increase the reliability of your workload. Well, it depends. The following is a quote from the AWS Well-Architected Framework reliability pillar:

“Availability goals for most workloads can be satisfied using a Multi-AZ strategy within a single AWS Region. Consider multi-Region architectures only when workloads have extreme availability requirements, or other business goals, that require a multi-region architecture.”

Taking a multi-Region approach seems natural for a disaster recovery strategy since you may want to protect your workload against a large-scale event if that’s necessary to meet your recovery objectives. Such a large-scale event could consist, for instance, of an AWS service becoming unavailable across all AZs of a Region, or even worse, more than one AWS service becoming unavailable within a given Region.

In such a case, your RTOs and RPOs, together with the budget at your disposal to implement this disaster recovery protection, will largely influence your solution design. AWS provides multiple capabilities to operate services across Regions. For example, AWS provides continuous, asynchronous data replication of data stored on Amazon S3 using its Cross-Region Replication feature. Amazon RDS Read Replicas and Amazon DynamoDB global tables also support multi-Region setups. With continuous replication in place, your data can then become available across multiple Regions. AWS CloudFormation, which you can use to implement an infrastructure as code approach, also helps you define your infrastructure and deploy it consistently across AWS accounts and multiple AWS Regions. Last but not least, Amazon Route 53 and AWS Global Accelerator let you route traffic between multiple Regions. For instance, you may want to always split the traffic in specific proportions between regions, or prefer to route requests based on geo-proximity or based on latency.

That said, operating your workload across multiple Regions will considerably raise the complexity and costs of your solution design. So, make sure to use such a setup only when you really need it.