Selecting a database depends on many factors, and you may have to leverage more than one database for your workload to deliver optimal performance. It all very much depends on the type of data you handle, the type of access to support, the querying capability expected, and additional factors such as latency, scalability, consistency, and partition tolerance.
So, the first thing to do is very clearly understand your workload requirements. People tend to stick with the technology they know; nobody is immune to that behavior. If you have been working with relational databases all your career, chances are that they will be a central piece in your solution design. But is it the right choice for this particular workload in the cloud?
First, you do not have to rely on a single technology anymore because AWS presents a plethora of database technologies and because these technologies are available as pay-as-you-go services, including fully managed services. So, you don’t have any reason to consider one technology for your entire solution.
Second, your workload components or services will have very distinct requirements from each other. Some may require strong and reliable transactionality, à la Atomicity, Consistency, Isolation, Durability (ACID). Some others may require strong partition tolerance and need to be almost always on. Some others may need consistent ultra-low latency. These needs will make it difficult for you to just use a single technology. If you try to force a single technology, you may have to sacrifice performance to do so or end up with much higher costs ultimately because more resources would be needed to meet those performance needs. The diverse options at hand in AWS in terms of databases are discussed next.
Starting with relational databases, this type of data store is optimal for managing data that follows a pre-established (and stable) structure, or schema, composed of inter-related data entities. It is also very much adapted for workloads relying on ACID transactions, strong data consistency, and referential integrity. AWS offers multiple managed services supporting relational data stores: Amazon Relational Database Service (RDS), Amazon Aurora (Aurora), and Amazon Redshift (Redshift). RDS proposes a managed service implementation of your preferred database engine, choosing between MySQL, PostgreSQL, MariaDB, Microsoft SQL Server, and Oracle. Aurora provides a MySQL-compatible or PostgreSQL-compatible database tailored to AWS, bringing improved performance, scalability, and availability over the plain open-source implementations. Redshift provides analytics capabilities at scale, giving you the ability to run a data warehouse in the cloud but also to launch analytical queries combining data from multiple sources, such as your data lake or other operational data stores.
Further, there are non-relational databases or NoSQL databases. Unlike relational databases, NoSQL databases are better at handling data with a dynamic schema and also better at scaling horizontally (think internet-scale here) but less good at handling random queries. AWS offers several managed services in that space: Amazon DynamoDB, Amazon DocumentDB, Amazon Keyspaces, and Amazon Neptune. DynamoDB is a serverless key-value database, meant to support cloud applications at any scale. It also works in a multi-master multi-region mode for applications that need to operate globally. DocumentDB is a document database specialized in storing and querying JSON-like documents. It is also compatible with the Apache 2.0 open-source MongoDB APIs. Keyspaces is a serverless wide-column database that offers compatibility with the Apache Cassandra CQL API. Neptune is a graph database capable of handling high-throughput and low-latency requirements. It also supports queries using both Gremlin and SPARQL.
AWS also provides a number of additional managed database services that do not directly fall under the previous two families. Amazon MemoryDB for Redis is a managed Redis-compatible in-memory database, distributed across multi AZs and supporting microsecond read and single-digit millisecond write operations. Amazon ElastiCache is an in-memory data store, compatible with either Redis or Memcached. Although ElastiCache can also be used as a persistence layer, it is typically employed as a caching layer in front of an existing database, such as RDS, for instance, to accelerate datastore performance. ElastiCache delivers sub-millisecond response latency for both read and write operations. Amazon Timestream is a time-series database specialized in the collection and storage of time-series data (for instance, telemetry or IoT device/sensor data). Its architecture is optimized to handle fast writes and large analytical queries. Lastly, Amazon Quantum Ledger Database (QLDB) is a ledger database that stores data in an append-only journal where all changes are cryptographically verifiable.