This was slightly touched upon under canary testing. It’s good practice to validate that end-to-end requests perform as expected.
Leverage AWS X-Ray, or third-party equivalent tools, to help you understand how your workload and its underlying components are performing. Tracing can also prove particularly useful for debugging, as this can be a pretty arduous task on distributed systems.
Unless your workload expects a constant demand forever, which would be quite exceptional and unlikely in most cases, you need to make sure it can scale. And, to reap the most benefits, you want that scaling to be as automated as possible to closely follow the demand. That means scaling up when a surge in demand occurs, but also scaling down with the demand.
AWS provides a number of mechanisms to scale resources. When using serverless AWS services, such as, for instance, Amazon S3, AWS Lambda, or Amazon DynamoDB (on-demand throughput), your resources scale automatically with the demand on your behalf. You only need to make sure to configure the services properly for scaling, for instance, Lambda concurrency, and that you don’t overrun your service quotas.
On the other hand, AWS Auto Scaling lets you automatically scale a number of resources and services, among which are Amazon EC2, Amazon ECS, Amazon DynamoDB (provisioned throughput), and Amazon Aurora. It provides two types of scaling: dynamic scaling and predictive scaling. Dynamic scaling lets you add or remove resources based on the actual utilization as measured with commonly used metrics. For instance, consider an application deployed on EC2 instances behind a load balancer. After running some performance tests, you will realise that, under stress, the application saturates the CPU first. You may then group the EC2 instances in an Auto Scaling Group (ASG) and define an autoscaling plan where you specify that whenever the average CPU over your ASG goes above 75%, you scale out, and whenever the average CPU over the fleet goes below 30%, then you scale back in. With predictive scaling, AWS Auto Scaling lets you anticipate the needs and scales resources to the expected foreseen capacity. To do that, it analyzes the historical behavior of your workload (for 14 days) against a specific metric and makes a prediction for the coming 2 days. This is useful to make sure the performance of your workload remains constant (if the prediction is correct). And, if needed, you can combine predictive and dynamic scaling for a more effective scaling strategy (for instance, to handle an unexpected surge in demand not anticipated by the predictions). Note that, at the time of writing this, predictive scaling is only available with EC2 ASGs.
Before scaling every resource at every layer of your design, don’t forget to optimize your design for your use case(s). For instance, if your workload would benefit from a CDN (such as Amazon CloudFront), leverage it to offload your origin servers, then you only need to set the autoscaling mechanism to handle the residual load reaching your origin servers. Similarly, when using predictive scaling, you can first observe the validity of predictions over time, and then draw out your autoscaling strategy based on those observations.