With so much valuable data within CloudTrail and other logs, finding effective ways to query the data for specific entries is always a top priority. The Amazon Athena service makes running ad hoc queries on extensive datasets much more straightforward. When discussing storing logs on the S3 service earlier in the chapter, one of the feature’s shortcomings was the ability to query the logs. This gap in capabilities is filled by using the Amazon Athena service.
Amazon Athena is a serverless service that allows you to quickly analyze data stored within Amazon S3, such as your CloudTrail logs. Athena does this by using an interactive query service and letting you write your queries using standard Structured Query Language (SQL). As a result, it is an efficient service to help you scan massive datasets.
The following are some key facts to understand about Amazon Athena:
Figure 9.16: Capturing logs from AWS WAF and searching with Amazon Athena
As shown in Figure 9.16, the logs are first enabled on the AWS WAF. Kinesis Data Firehose is configured to ingest the logs and place them in the desired S3 bucket. The AWS Glue Data Catalog then transforms the log data from JSON into a format that Amazon Athena understands. You can then use Amazon Athena to query the data using standard SQL to mine the detailed data that will be used in the visualizations in Amazon QuickSight. Finally, Amazon QuickSight uses the Athena data as the data source for the visualizations.
Note
QuickSight is not a necessary solution to search the logs. QuickSight allows for interactive visualization using both your queries and your data.
The next section will introduce a native solution that provides extremely fast searching and visual graphing capabilities—Amazon OpenSearch.
As logs and other pieces of data you are responsible for securing are generated in your environment, be sure to consider your security goals for the data itself and your organization as a whole. Ask yourself whether these goals include the following:
If the answer is yes to one or more of these items, then provisioning an Amazon OpenSearch Service cluster could help meet your needs.
Amazon OpenSearch Service is a search and analytics engine developed to be compatible with Elasticsearch, a popular search engine based on Elasticsearch provided by the Elastic company. In addition to the ability to store and quickly search through the information stored in it, OpenSearch offers powerful visualization capabilities powered by OpenSearch Dashboards that let you and your team members graphically see results over time.
OpenSearch is designed to handle large amounts of data and provide rapid search results coupled with analytics capabilities. Some use cases for OpenSearch include log analytics, full-text search, and real-time application monitoring. A positive feature of Amazon OpenSearch Service is that it supports many different data types for ingestion and storage and easily integrates with other AWS services such as AWS Lambda and Kinesis.
Many security features are built into Amazon OpenSearch Service, but you should first and foremost understand how it handles data protection.
The key points and benefits to remember (especially for the exam) about OpenSearch Service are the following: