The AWS Well-Architected Framework is a big topic. Although you don’t need to be an expert on this topic for the exam, having a basic understanding of the pillars and their purpose will help you with some difficult questions on the exam.
Both the Operational Excellence and Performance Efficiency pillars rely on monitoring resources, but the way in which the monitoring data is used is different. Performance Efficiency is the focus of this section, so the monitoring process described in this section centers around resource metrics related to ensuring the resources are running at the standards that your organization has defined.
Amazon has also defined five different phases of the monitoring process, which you should be aware of for the exam. See these five phases in Figure 15.2.
FIGURE 15.2 Amazon’s five phases of the monitoring process
The focus of the Generation phase is to determine the scope of what you will monitor. Although just monitoring everything might sound like a good idea, this can lead to information overload, making analyzing the data difficult. Also, monitoring can increase costs, so you should carefully consider what you monitor.
During this phase, you also determine your thresholds. For Performance Efficiency, it is important to determine minimum and maximum thresholds that meet your business needs. For example, you may consider the CPU utilization of an EC2 instance of between 40 and 80 percent most optimal for performance of a particular system. In that case, you should plan on setting thresholds that match this range.
In the Aggregation phase, you determine which sources of monitoring data provide you with a more complete view of your AWS environment. For example, an EC2 instance that is used to run a web server uses data that is stored on an EBS volume. As a result, monitoring the EC2 instance performance and the performance of the EBS volume can provide the best overall view of the solution’s performance efficiency.
As discussed in previous chapters, monitoring tools that are provided by AWS allow you to process events in real time and generate alarms. It is during this phase that you determine which effects to process and produce alarms.
Again, producing alarms for many events is tempting, but realize that creating too many alarms can have an adverse effect. For example, people who are receiving alarms may develop a habit of ignoring them if too many unimportant alarms are generated. Remember that alarms should be issued when immediate action is required, not just for informational purposes.
Monitoring data must be stored somewhere. This storage can cost money when housed in the cloud, so you want to develop a retention policy to determine how long to keep monitoring data. During the Storage phase, you should also consider where the data will be stored and who should have access to the data. In addition, you should develop policies for transporting the data securely.
Monitoring data by itself is rarely useful. During the Analytics phase, you should develop procedures on how the data will be analyzed. Will you rely on a person reading data on a dashboard, or will you use an automated tool to provide reports and insights on the monitoring data?