Generation – Monitoring All Components of Your Workload – SAP-C02 Study Guide

Generation – Monitoring All Components of Your Workload

This may sound obvious, but it is essential to monitor all the components of your workload without exception, using either Amazon CloudWatch or third-party solutions if you prefer. From the frontend to the backend and the storage or database layer, you should make sure to collect the key metrics for your workload. That includes extracting them from the logs when necessary. Then, define all the thresholds that you want to monitor, typically those for which you want to trigger an alarm when they’re crossed.

AWS provides monitoring information and logs in abundance.

Many services, such as, for instance, Amazon EC2, Amazon Elastic Container Service (ECS), and Amazon Relational Database Service (RDS), publish metrics for CPU or RAM consumption, network I/O, and disk I/O. Many other AWS services publish service-specific metrics to CloudWatch. For instance, Amazon API Gateway or AIML services such as Amazon Rekognition publish metrics for successful and unsuccessful requests. For a complete list of all the AWS services that publish metrics to CloudWatch, and of the metrics published by each of them, please check out the AWS CloudWatch documentation at https://packt.link/sBreQ.

Next to metrics publication, Amazon CloudWatch logs collect logs streamed from AWS services and your own applications. You can also leverage additional logs, such as the following:

  • VPC Flow Logs to analyze network traffic in and out of your VPCs
  • AWS CloudTrail to find out about any activity on your accounts that involves AWS service API calls, including actions taken through the AWS Management Console, AWS SDKs, and command-line tools

AWS provides a number of additional services that can come in handy as well:

  • Amazon EventBridge is a real-time event delivery system where you can listen to events describing changes in AWS services. You can also use EventBridge to publish and listen to your own custom events, for your workload, and for third-party solutions.
  • AWS Personal Health Dashboard is a service that provides your very own personal health dashboard of the AWS services being used by your workload(s). If a service event potentially impacts your resources in one of the AWS regions or AZs, you will find an event description and a link to your impacted resources.
  • AWS Config is a configuration management service offering an AWS resource inventory, configuration history, and configuration changes (and notifications of these changes). You can track changes including those that put your workload reliability at risk. A set of config rules following the best practices from the AWS Well-Architected Framework reliability pillar is available out of the box as a Config conformance pack to make your life easier.

What you’ve learned so far in terms of monitoring allows you to monitor your workload from the inside by processing metrics collected from its various components. Now, if your workload offers external endpoints, you also want to monitor them from the outside. First, you want to verify that your external endpoint(s) can be reached. Second, this also gives you another chance to detect faulty behavior, if for some reason your monitoring failed to report it or, more likely, you failed to capture it. You can conduct this type of active monitoring with synthetic transactions, also referred to as canaries. The name takes its origin from the birds that were carried by the miners down the coal mines to detect lethal gas leaks early. The idea is essentially the same, fortunately without harming any actual bird. However, don’t overload your endpoints with canary tests. Their purpose is merely to do a health check, not to put your workload under stress. Amazon CloudWatch Synthetics enables you to create such canaries to monitor your endpoints. It then deploys your canaries (scripts written in Node.js or Python) to Lambda functions in your account.