THE AWS CERTIFIED ADVANCED NETWORKING – SPECIALTY EXAM OBJECTIVES COVERED IN THIS CHAPTER MAY INCLUDE, BUT ARE NOT LIMITED TO, THE FOLLOWING:
Objective 1.4: Define logging and monitoring requirements across AWS and hybrid networks.
CloudWatch is the AWS monitoring service used to monitor and manage your AWS deployments. It plays a critical role in providing insight with a wide range of metrics including resource usage and errors that can be used to make sure your services are optimized and, if something should go wrong, generate alarms to trigger remediation steps to restore down or performance-impacted services. CloudWatch plays a critical role in monitoring not just the networking components but all of the AWS offerings including plug-in code to customize the monitoring of your applications. With the installation of agents, on-premises servers can export log and metric data to CloudWatch to allow for a complete AWS monitoring solution.
The service includes integrated tools for analytics to quickly be able to analyze the collected data. With the alerting feature, external systems can be notified of alarms to take actions with your existing network management tools.
In this section, you will learn about the components that are part of the CloudWatch suite of tools and how they are used and integrated to make sure you have complete visibility into your AWS operations. The service supports integration with third-party monitoring systems for log collection, monitoring, graphing, and any other systems management applications.
Metrics represent a time-ordered set of data points that are sent to and processed by CloudWatch. A metric is a variable value that varies over time such as I/O throughput, CPU utilization, database writes, or any other thousands of data points in AWS. AWS includes a large number of predefined metrics and allows you to define custom metrics.
Metrics and logs are collected by CloudWatch on a regional level and stored in a CloudWatch repository. The repository displays data only for the region it’s configured for and any data collected from external sources that are exported to the respective regional repository.
Metrics are displayed in the CloudWatch console, as shown in Figure 5.1. Also, the CLI interface, API calls, and the SDK provided by AWS allow access to your collected metrics. This flexibility allows for other systems and tools to be integrated into CloudWatch. Metrics collect data from the systems, subsystems, or resources being used. Every metric records values such as state information or resource usage values including values such as CPU utilization, health check status, disk activities such as read and write operations, network activity, and many other values.
There are three categories of metrics. Standard metrics are the default type and are collected in 5-minute windows and provided by AWS at no cost. Detailed monitoring is recorded in 1-minute windows and must be enabled by the operator. Custom details are configurable down to 1-second intervals. There is an additional cost for detailed metrics.
Data retention times vary based on the metric type. For example, custom metrics recorded in less than 1-minute intervals are stored for 3 hours, 1-minute customer metrics are retained for 15 days, 5-minute metrics are stored for 63 days, and 1-hour metrics are stored for 15 months. It is most common for the systems consuming these metrics to process the data in near real time, so these retention values should be sufficient. Note that based on your cloud deployments and activity levels, CloudWatch can generate a massive amount of data in a short time, so storage costs should be monitored closely. Metrics are aggregated upward. This means that when, for example, the 1-minute detailed metric retention time expires, the data is rolled into 5-minute metrics and then 1-hour metrics with the retention time extended accordingly.
The console gives you a detailed view of the metrics collected for the selected region. You can select different metric groups and get more granular data by clicking into the groups for more specific data. You can also select the time range for which to view the metric data, as shown in Figure 5.2.