This chapter covers the following official AWS Certified SysOps Administrator – Associate (SOA-C02) exam domains:
Domain 2: Reliability and Business Continuity
Domain 4: Security and Compliance
(For more information on the official AWS Certified SysOps Administrator – Associate [SOA-C02] exam topics, see the Introduction.)
At this point you should be familiar with how to make an application scalable, elastic, highly available, and resilient to failures. So you might be asking yourself: “If my application has all of the aforementioned features, why even consider backing up?” Well, the simple answer is that backups are for times when things go very wrong.
A backup represents a point-in-time recovery that you can return to. It’s not just failures of hardware and errors in software that take applications down. Most of the time it is human-initiated actions, which could be mistakes or malicious actions. Either can take the application down, corrupt the data, destroy the database, and so on. Regardless of whether the issue is caused by hardware, software, or a human, a backup—or even multiple backups—ensures you can recover to a point in time when everything was working.
In this chapter, we discuss how to implement a good backup strategy in AWS and which services to use to ensure all your data is backed up.
This section covers the following official AWS Certified SysOps Administrator – Associate (SOA-C02) exam objectives for Domain 2: Reliability and Business Continuity and Domain 4: Security and Compliance:
Domain 2.3: Implement backup and restore strategies
Domain 4.2: Implement data and infrastructure protection strategies
If you can correctly answer these questions before going through this section, save time by skimming the Exam Alerts in this section and then completing the Cram Quiz at the end of the section.
1. Which components need to be considered as dynamically changing over time when backing up your application?
2. How often must a backup be taken, and how quickly must the application be recovered if it has a recovery-point objective (RPO) of 30 minutes and a recovery-time objective (RTO) of 1 hour?
3. Is it possible to retain a backup of an RDS database for an indefinite amount of time?
1. Answer: The state of the application and the data. All stateful services should be backed up regularly. If a service is stateless, no backup is required because the objects in that service can be re-created with either an original image or an orchestration template.
2. Answer: The application needs to be backed up at least once every 30 minutes (more is recommended if financially viable) to meet the RPO. The application needs to be recovered and operational within 1 hour to meet the RTO.
3. Answer: Manual RDS snapshots can be taken of the RDS database and retained indefinitely.
In the cloud you can consider all backups as point-in-time records of the state of an application. This means that any stateful environment requires backing up, whereas any stateless platforms can simply be re-created.
One of the core best practices in AWS is to keep as many components of an application as stateless as possible. This also reduces the scope of backups because you really need to focus only on the components that dynamically change over time.
For example, a well-architected multitier application has a load balancer, a web front end, one or more application back ends, and one or more datastores. The datastores could consist of databases, object storage, file storage, and other block devices. In AWS, these are always represented by services. Some services contain stateful data, but others do not. If a service is stateless, in the case of failure, the objects within the service can simply be re-created. Most infrastructure services are completely stateless and can be redeployed using orchestration.
For example, you can use a CloudFormation template to deploy all network components. If a failure or human error occurs during an update, you can simply roll back any changes or even redeploy the entire environment through the original CloudFormation template to recover the objects. Other examples are EC2 instances. They can be either stateless or stateful. Stateless applications do not store any state inside the compute environment, whereas stateful applications do. For example, a stateful web server sources its session information in the memory of the server instance. If the instance fails, the session information is lost. A stateless server stores the state information outside the instance—for example, into a database or caching platform. The best practice for compute environments is thus keeping them stateless. In case of failure, a stateless instance can simply be redeployed using the AMI it was originally deployed from.
Table 6.1 shows which services are stateful and the backup/recovery strategy you can employ.
TABLE 6.1 AWS Services of a Multitier Application and Their State
Service | Type | Stateful | Backup/Restore |
VPC, subnets, IGW… | Network | No | Re-create/orchestration |
ELB | Network | No | Re-create/orchestration |
EC2 | Compute | No | Redeploy/AMI |
EBS | Block Storage | Yes | Snapshot/AWS Backup |
RDS | Database | Yes | Snapshot/AWS Backup |
EFS/FSx | File | Yes | Replicate/AWS Backup |
S3 | Object Storage | Yes | Versioning |