Start by reviewing the available choices for compute resources on AWS. They fall under three subcategories: virtual instances with Amazon Elastic Cloud Compute (EC2), containers with Amazon Elastic Container Service (ECS) or Amazon Elastic Kubernetes Service (EKS), and functions with AWS Lambda. The following sections discuss each of them.
With Amazon EC2, you have access to a broad variety of virtual servers and bare-metal instances. Each of these EC2 instances belongs to a family, and a generation within that family, and possesses unique characteristics. Some of them can satisfy the needs of a large variety of workloads (general-purpose instances such as those from the M and T families), some are optimized for CPU-intensive workloads (the C family), some are optimized for memory-intensive workloads (the R, X, and Z families), others offer high storage density with high throughput or low latency (the D and I families), and yet others offer hardware acceleration, such as the P and G families (GPU-based acceleration) or the F family (FPGA-based custom acceleration). There are a few more families that provide specialized chips for one type of task, such as machine learning inference (the Inf family), machine learning training (the Trn family), or video transcoding (the VT family).
On top of their family characteristics, each instance has a specific tee-shirt size that determines the amount of CPU, memory, storage, and network bandwidth available.
Then, some variants also exist within the various families to provide either extra local storage (instance subtypes with a d, for instance, z1d) or additional network bandwidth (instance subtypes with an n, for instance, C5n), or sometimes both (for instance, G4dn).
Additionally, EC2 instances have supported an increasing variety of CPU processors over time. The latest generations of EC2 instances are now powered by either Intel, AMD, or ARM-based AWS Graviton processors. For instance, the M6 type of EC2 instances, the sixth generation of the M family, has declined in instances with either Intel CPUs (M6i), AMD CPUs (M6a), or Graviton2 CPUs (M6g). AMD-based EC2 instances were first introduced to provide a cost-efficient alternative to Intel-based EC2 instances. Since they both run on the same x86 chipset, they require no change from an application perspective and provide an interesting option if you don’t rely on Intel-specific features. Graviton-based EC2 instances were later introduced by AWS to provide an even more appealing alternative to Intel or AMD EC2 instances. They indeed brought significant cost efficiency gains and, with the latest generation of Graviton processors, even added lasting performance improvements over their x86 peers. Nowadays, many enterprises have adopted Graviton-based EC2 instances, citing slashed EC2 costs and boosted performance. They come at a price, though: Graviton-based instances rely on the ARM chipset, so you must first make sure that your workloads can be effectively ported to that chipset. In particular, you will have to procure all the dependencies, such as application libraries, for the ARM platform, and your workload binaries, if any, will require re-compilation.
That said, before picking up an instance type, you must consider the major characteristics of your workload in terms of performance. For that, beyond the initial assumptions you can make, observing and measuring are essential. You should collect metrics showing evidence of those characteristics. Is it mostly bound by the compute power available (number of CPUs or maybe the CPU clock)? Or is it memory-hungry, rapidly saturating all the RAM it can find? Or is it limited by the network bandwidth available? Or is it tied to the disk throughput or the storage latency? You will be able to tell based on observation. Now, which instance type do you start with to make those observations? In the absence of any obvious characteristics, it is recommended to start with one of the general-purpose instances, from the M or T family. T family instances are so-called burstable instances, which are useful for workloads that operate most of the time under a moderate baseline but occasionally need to burst above that baseline to meet a punctual spiky demand. Alternatively, if you already have a fair idea of the major characteristics of your workload, you can directly start with an instance from what seems like the most appropriate family (for instance, when you need GPU-based instances).
And then you iterate, based on measurement and observation, either to increase or reduce the size of the instance or to change the family if what you observe invalidates your hypotheses.
As always with AWS, nothing beats experimentation, so don’t hesitate to try out multiple families and sizes if you’re unsure. If you follow best practices and automate your CI/CD process, testing different EC2 instance families and sizes should be straightforward. When you do, monitor the performance of your workload on each type and size of EC2 instance that you try out. For that, you can leverage EC2 metrics reported by CloudWatch; in particular, as a minimum, pay attention to metrics such as CPUUtilization, MemoryUtilization, and NetworkIn, as well as NetworkOut, to understand whether your workload may suffer from a lack of CPU power, or from memory or network bandwidth exhaustion.
Finding the optimal instance type and size for your workload, also known as rightsizing, is a must to optimize your compute resource usage. Remember that the cloud brings elasticity. So, when you have found the ideal instance types to support your workload, remember that you can leverage AWS auto-scaling capabilities to scale out the number of instances used to support the load on the various components of your design. Simply put, it is beneficial most of the time, both from a performance but also a cost standpoint, to use multiple smaller instances that can be scaled out (and then back in), instead of a few large ones. In any case, plan for scalability and put the necessary mechanisms in place to automatically adjust your workload capacity as close as possible to the demand.