Read Replicas – Implementing Scalability and Elasticity – SOA-C02 Study Guide

Read Replicas

This section covers the following official AWS Certified SysOps Administrator – Associate (SOA-C02) exam domain:

Domain 2: Reliability and Business Continuity

CramSaver

If you can correctly answer these questions before going through this section, save time by skimming the Exam Alerts in this section and then completing the Cram Quiz at the end of the section.

1. Your three-tier application has been connected to a business intelligence (BI) forecasting platform. While the forecasts are improving business practices, the users of your application are reporting the performance has decreased. The web and app tier are scaling appropriately, and the caching cluster is at about 40 percent capacity. What could be the cause of the slowdown seen by the users, and how could you resolve it?

2. True or False: Aurora natively supports both MySQL and PostgreSQL.

Answers

1. Answer: The BI platform has introduced additional load on the database. Because the BI forecasting requires access to most or all of the dataset, the cache cannot be used to offload the required reads. To mitigate, implement a database read replica.

2. Answer: True. The two engines are fully supported at the time of writing. Other database engines might be supported in the future.

There are five general approaches to scaling database performance:

Vertical scaling: You can add more CPU and RAM to the primary instance.

Horizontal scaling: You can add more instances to a database cluster to increase the available CPU and RAM, but this approach is not always supported.

 Sharding: When horizontal scaling is not supported, you can distribute the dataset across multiple primary database engines, thus achieving higher write performance.

Caching: You can add a caching cluster to offload reads, which can be expensive.

Read replicas: You can add read replicas to offload read traffic, possibly asynchronous.

Read replicas can potentially offer the same benefit of read offloading that caching can have for your application. One big benefit of read replicas is that the whole database is replicated to the read replica, not just frequently read items. This means that read replicas can be used where the reads are frequently distributed across the majority of the data in the database. As you can imagine, this would be very expensive and resource intensive to achieve in the cache. You would need to provision enough in-memory capacity (potentially terabytes), read the entire database, and write the data into the cache before you could perform any complex operation on the data in the cache. Sometimes complex operations can’t even be performed on the caching server; for example, joins of multiple tables for analytics, business intelligence, or data mining purposes are just not possible within the cache.

This is where read replicas excel. You can introduce read replicas in most instance-based AWS database services, including RDS, Aurora, DocumentDB, and Neptune.

In the RDS service, up to five read replicas can be deployed in any MySQL, MariaDB, PostgreSQL, or Oracle database. Because the data resides on the volume attached to the instance, built-in database replication tools are used to replicate the data across the network to the read replica. This means that read replicas introduce additional load on the primary instance and that the replication is always asynchronous with a potential lag of a few seconds to potentially a few minutes in extreme cases. Figure 4.12 illustrates RDS read replicas.

FIGURE 4.12 RDS read replicas

The replica can be placed in any availability zone in the same region as the primary database or can be deployed in a cross-region deployment. A read replica can also be promoted to a primary instance, which means that establishing a read replica can be an easy way to clone a database. After the replica is promoted to primary only, a sync of all the missing data is required.