Percentiles – Monitoring Services in AWS – SOA-C02 Study Guide

Percentiles

Percentiles are available for some AWS services. They enable you to understand where in a particular percentile of the service a specific dimension lies. This information allows you to find outliers and use the data in long-term statistical analysis.

CloudWatch Alarms

As we already mentioned, the CloudWatch service enables you to trigger alerts when a certain condition is present for a certain number of CloudWatch checks. An example is the CPU usage of an EC2 instance. If the instance usage is above the 90 percent threshold for a period of three CloudWatch checks, you trigger a notification to an administrator. Because the number of checks depends on the collection interval of the metric, this would represent 10 minutes with standard metrics (first check at 0 minutes, second at 5 minutes, and third at 10 minutes) and 2 minutes for detailed metrics with a 1-minute interval (first check at 0 minutes, second at 1 minute, and third at 2 minutes).

CloudWatch Logs

The beauty of CloudWatch is the ability to apply the same features mentioned in the preceding section to logs. When collecting logs to CloudWatch, you can easily view them in the AWS Management Console, perform analytics, and create alarms based on the patterns you define. Logs are stored in CloudWatch indefinitely; however, you can automatically truncate them by setting the retention period to between 1 and 10 years. Logs can also be exported to S3 and then archived to Glacier for more cost-effective storage. We cover S3 and Glacier in more detail in Chapter 6, “Backup and Restore Strategies.”

CloudWatch Logs Insights

When working with logs, CloudWatch treats every log entry as streaming data that is available for processing shortly after it is delivered to CloudWatch. CloudWatch Logs Insights provides a simple-to-use interface where you can run SQL-like queries to search and filter through the log content, run simple transformations, and visualize the data. The service enables you to discover causes for past issues and run continuous validation of the platform state after each change or application deployment.

You can also use CloudWatch Logs Insights to search for operational information for services that log such information. For example, you can search for the number of email messages that were sent or received via the WorkMail service during a certain period of time, as shown in Figure 2.2.

FIGURE 2.2 CloudWatch Logs Insights