This may sound obvious, but it is essential to monitor all the components of your workload without exception, using either Amazon CloudWatch or third-party solutions if you prefer. From the frontend to the backend and the storage or database layer, you should make sure to collect the key metrics for your workload. That includes extracting them from the logs when necessary. Then, define all the thresholds that you want to monitor, typically those for which you want to trigger an alarm when they’re crossed.
AWS provides monitoring information and logs in abundance.
Many services, such as, for instance, Amazon EC2, Amazon Elastic Container Service (ECS), and Amazon Relational Database Service (RDS), publish metrics for CPU or RAM consumption, network I/O, and disk I/O. Many other AWS services publish service-specific metrics to CloudWatch. For instance, Amazon API Gateway or AIML services such as Amazon Rekognition publish metrics for successful and unsuccessful requests. For a complete list of all the AWS services that publish metrics to CloudWatch, and of the metrics published by each of them, please check out the AWS CloudWatch documentation at https://packt.link/sBreQ.
Next to metrics publication, Amazon CloudWatch logs collect logs streamed from AWS services and your own applications. You can also leverage additional logs, such as the following:
AWS provides a number of additional services that can come in handy as well:
What you’ve learned so far in terms of monitoring allows you to monitor your workload from the inside by processing metrics collected from its various components. Now, if your workload offers external endpoints, you also want to monitor them from the outside. First, you want to verify that your external endpoint(s) can be reached. Second, this also gives you another chance to detect faulty behavior, if for some reason your monitoring failed to report it or, more likely, you failed to capture it. You can conduct this type of active monitoring with synthetic transactions, also referred to as canaries. The name takes its origin from the birds that were carried by the miners down the coal mines to detect lethal gas leaks early. The idea is essentially the same, fortunately without harming any actual bird. However, don’t overload your endpoints with canary tests. Their purpose is merely to do a health check, not to put your workload under stress. Amazon CloudWatch Synthetics enables you to create such canaries to monitor your endpoints. It then deploys your canaries (scripts written in Node.js or Python) to Lambda functions in your account.