Part of the overall architecting for performance approach is to monitor how your resources actually perform. The idea is to do it not just once at the beginning, when you lay out your solution design, but continuously, to monitor how your workload performs over time and detect any deviation.
First, identify the metrics that are important for you to monitor. For instance, say that you set up an e-commerce application; it is essential for you to know the transaction throughput and its variation over time, but also to spot any I/O bottlenecks, the evolution of the request latency, and so on. This obviously varies from workload to workload and depends on the performance criteria that your workload must meet. Once you have identified these metrics, make sure that you collect and record them. Leverage a monitoring tool to assist you in this task: either native services from AWS such as CloudWatch, or any other third-party tool you prefer.
Then, you could create a dashboard with the relevant metrics. Define some Key Performance Indicators (KPIs) calculated using these metrics, then add them to your dashboard to help you understand how well your workload performs compared to its objectives. In the case of an e-commerce application, one such KPI could be the volume of sales, for instance. Then, create some alarms and notify the relevant people when an alarm is triggered. In some cases, you might be able to remediate automatically when an alarm is triggered; for instance, if an alarm identifies a need for more compute resources, you may be able to handle the situation by scaling said compute resources horizontally with their built-in autoscaling mechanisms.
Before testing your workload for performance, make sure to have all the aforementioned elements in place. Then, lay out your test plan to simulate real-life scenarios you expect to encounter. After that, go through your test plan and adjust your solution design based on your findings.
Remember to include tests that validate the end user experience when your workload runs under stress, performing real-user monitoring to spot issues that may affect that experience. You can again leverage CloudWatch RUM and other CloudWatch digital experience monitoring capabilities for that.
Then, routinely review the metrics and KPIs collected during normal functioning or during or after an event or incident occurs. These reviews will help you confirm the metrics that were essential to spot the issue and help you identify additional metrics that you should be monitoring to obtain a fuller picture and hopefully prevent a similar event or incident from taking place again.
So, at this stage, you have created your solution design and tested it against your workload performance objectives, your monitoring is in place, and you are actively collecting metrics and measuring KPIs. Is the task done, then? Well, almost, but not quite.
The following section discusses the last stage of designing for performance.