It may sound obvious, but every measure you take to protect your workload against a disaster should be carefully considered and planned accordingly. First, these measures will have a significant impact on your solution design. Second, the cost of your solution will increase with the degree of protection against a disaster. So, you want to keep both aspects under control.
Eventually, you will create a DR plan for your workload. That document will become part of your organization’s business continuity plan to make sure the organization can keep operating its business in case of a disaster.
Now, as always, you want to start with your requirements before creating that DR plan. What are your actual business needs in terms of DR protection? The last thing you want is to spend a huge effort and a lot of money on something that will not be useful.
So, when building that DR plan, ensure that every protection you put in place serves a purpose. As an example, suppose that you need to design your workload to survive a major disaster in the AWS Region where it is deployed. You design the solution so that you can recover your workload in a second Region within a reasonable amount of time and start operating again from that Region. Imagine now that a natural disaster, such as a major earthquake across the entire Region, impairs your workload. As expected, thanks to your design you are able to start operating again in another AWS Region. But imagine that the same disaster that impaired the AWS Region also impaired the rest of your organization’s business operations. Depending on the business function supported by your workload, it might be useful to have it survive such a large-scale event, but it might also be useless if the rest of your business is down.
The bottom line is, before putting in place a sophisticated DR plan, make sure it is aligned with your organization’s business continuity plan, and in particular, make sure to consider the DR plans of other business functions that your workload depends on.
The first thing to do is to conduct a risk assessment. This will help you determine the risk associated with several types of disaster; that is, the impact of a failure of a single AZ, multiple AZs, a single Region, or multiple Regions. Also, remember that AZs are physically separated by many kilometers, so deploying your workload across multiple AZs already provides a fair level of protection against some forms of disaster (for example, local flooding, an earthquake, a power outage, or a lightning strike). Depending on the criticality of your workload, you will examine the diverse options at your disposal with the measures that you can take and the associated costs. Then, you will compare the various options, the associated risks, and the costs of each variant of the solution design to eventually decide which option is the best fit.
Now take a look into the possible protection measures you can take on AWS.