The warm standby approach goes a step further compared to the pilot light one. It extends the same concept but also maintains a running copy, although scaled down, of your workload. So, your service is already up and running, and the only thing you need is to scale up the compute resources required by your workload. This approach is illustrated in the following diagram:
Figure 7.4: Warm standby approach
So, understandably, this targets situations where your RTO is too low for both backup and recovery and pilot light scenarios, but not too low so you have enough time to scale up the environment before you can handle the full production load in the new region. This is typically a good fit when your RTO is in the minutes range.
Compared to pilot light, it is even easier to test and validate that your DR plan is fully functional with this approach because you don’t need to take any other action than scaling up to be fully operational.
AWS Services for a Warm Standby Approach
In the warm standby approach, you also use the AWS services already mentioned in the previous two approaches; but, on top, you need to ensure that your workload can rapidly scale up your compute resources to sustain the full production load in the new region. In this case, you’re going to rely on AWS Auto Scaling to monitor the performances of your compute resources and to adjust the capacity as needed. Auto Scaling works with other AWS services such as EC2, ECS, DynamoDB, and Aurora. EKS uses Kubernetes-specific autoscaling mechanisms, such as the Kubernetes Cluster Autoscaler or the recently announced Karpenter to scale cluster resources (such as EC2 Nodes) and the Vertical Pod Autoscaler and Horizontal Pod Autoscaler to scale Pods. You would then need to leverage those to ensure that your EKS clusters and Pods are scaled up to the desired capacity.
The multi-region active-active approach is the ultimate DR approach for the most business-critical workloads, for which none of the previous three approaches could satisfy your RTO and RPO. With this approach, your workload is running concurrently in (at least) two separate regions. This is illustrated in the following diagram:
Figure 7.5: Active-active approach
This approach entails scenarios where you need an RTO of zero (no downtime) and an RPO as close as possible to zero. This, however, comes at a cost since you have a fully functional and scaled-up environment to support your workload in multiple regions (at least two).
Compared to warm standby, because you don’t need to take any action at all, you are already fully operational in multiple regions, and it is even easier to test and validate that your DR plan is fully functional.
AWS Services for an Active-Active Approach
In this multi-region active-active approach, the same AWS services that were mentioned in the previous three approaches remain useful here. They may only be used slightly differently.
For instance, Route 53 or Global Accelerator would be configured to load balance traffic between both active regions and it is only in the case of a failover that they would redirect all traffic to the remaining healthy region.
Regarding data, all the solutions discussed also remain valid options. Your choice will be based on what you need to achieve in terms of RTO and RPO. Reads are not really an issue, since you can always manage either to redirect the reads to a read replica (such as with RDS or Aurora) or to have the concurrency increased automatically (such as with DynamoDB or S3). On the other hand, writes are often a thorn in your side, but you have a number of options to deal with them as given below:
Now that you have explored the approaches you can take, you are ready to learn how you can make sure that your DR strategy functions.