Amazon Aurora is the most reliable relational database engine developed by Amazon to deliver speed in a simple and cost-effective manner. Aurora uses a cluster of single primary instances and zero or more replicas. Aurora’s replicas can give you the advantage of both read replicas and Multi-AZ instances in RDS. Aurora uses a shared cluster volume for storage and is available to all compute instances of the cluster (a maximum of 64 TiB). This allows the Aurora cluster to provision faster and improves availability and performance. Aurora uses SSD-based storage, which provides high IOPS and low latency. Aurora does not ask you to allocate storage, unlike other RDS instances; it is based on the storage that you use.
Aurora clusters have multiple endpoints, including the cluster endpoint and reader endpoint. If there are zero replicas, then the cluster endpoint is the same as the reader endpoint. If there are replicas available, then the reader endpoint is load-balanced across the reader endpoints. Cluster endpoints are used for reading/writing, while reader endpoints are intended for reading from the cluster. If you add more replicas, then AWS manages load balancing under the hood for the new replicas.
When failover occurs, the replicas are promoted to read/write mode, and this takes some time. This can be prevented in a Multi-Master mode of an Aurora cluster. This allows multiple instances to perform reads and writes at the same time.
Amazon Redshift is not used for real-time transactions, but it is used for data warehouse purposes. It is designed to support huge volumes of data at a petabyte scale. It is a column-based database used for analytics, long-term processing, tending, and aggregation. Redshift Spectrum can be used to query data on S3 without loading data to the Redshift cluster (a Redshift cluster is required, though). It’s not an OLTP, but an OLAP. AWS QuickSight can be integrated with Redshift for visualization, with a SQL-like interface that allows you to connect using JDBC/ODBC connections to query the data.
Redshift uses a clustered architecture in one AZ in a VPC with faster network connectivity between the nodes. It is not high availability by design as it is tightly coupled to the AZ. A Redshift cluster has a leader node, and this node is responsible for all the communication between the client and the computing nodes of the cluster, query planning, and aggregation. Compute nodes are responsible for running the queries submitted by the leader lode and for storing the data. By default, Redshift uses a public network for communicating with external services or any AWS services. With enhanced VPC routing, it can be controlled via customized networking settings.
By combining Redshift with SageMaker, data scientists and analysts can leverage the scalability and computational power of Redshift to preprocess and transform data before training machine learning models. They can utilize Redshift’s advanced SQL capabilities to perform aggregations, joins, and filtering operations, enabling efficient feature engineering and data preparation. The processed data can then be seamlessly fed into SageMaker for model training, hyperparameter tuning, and evaluation.