Running Queries with Amazon Athena – Parsing Logs and Events with AWS Native Tools – SCS-C02 Study Guide

Running Queries with Amazon Athena

With so much valuable data within CloudTrail and other logs, finding effective ways to query the data for specific entries is always a top priority. The Amazon Athena service makes running ad hoc queries on extensive datasets much more straightforward. When discussing storing logs on the S3 service earlier in the chapter, one of the feature’s shortcomings was the ability to query the logs. This gap in capabilities is filled by using the Amazon Athena service.

Amazon Athena is a serverless service that allows you to quickly analyze data stored within Amazon S3, such as your CloudTrail logs. Athena does this by using an interactive query service and letting you write your queries using standard Structured Query Language (SQL). As a result, it is an efficient service to help you scan massive datasets.

The following are some key facts to understand about Amazon Athena:

  • Athena separates storage from compute by utilizing Amazon S3 for storage.
  • The Amazon Athena service is serverless, meaning there is no infrastructure or resources to manage.
  • You only pay for the data you scan.
  • It supports the following open-storage file formats:
    • Apache Web Logs
    • CSV and TSV files
    • JSON files
    • Parquet
    • ORC
  • It is a secure solution, allowing for IAM authentication and encryption at rest and in transit.

Figure 9.16: Capturing logs from AWS WAF and searching with Amazon Athena

As shown in Figure 9.16, the logs are first enabled on the AWS WAF. Kinesis Data Firehose is configured to ingest the logs and place them in the desired S3 bucket. The AWS Glue Data Catalog then transforms the log data from JSON into a format that Amazon Athena understands. You can then use Amazon Athena to query the data using standard SQL to mine the detailed data that will be used in the visualizations in Amazon QuickSight. Finally, Amazon QuickSight uses the Athena data as the data source for the visualizations.

Note

QuickSight is not a necessary solution to search the logs. QuickSight allows for interactive visualization using both your queries and your data.

The next section will introduce a native solution that provides extremely fast searching and visual graphing capabilities—Amazon OpenSearch.

Storing and Searching Logs in Amazon OpenSearch Service

As logs and other pieces of data you are responsible for securing are generated in your environment, be sure to consider your security goals for the data itself and your organization as a whole. Ask yourself whether these goals include the following:

  • Protecting confidential business data
  • Maintaining business access controls
  • Having the ability to audit user actions
  • Possessing the ability to integrate with SAML identity providers
  • Keeping your systems and data compliant with a myriad of compliance frameworks such as HIPAA, SOC, PCI, and others

If the answer is yes to one or more of these items, then provisioning an Amazon OpenSearch Service cluster could help meet your needs.

Amazon OpenSearch Service is a search and analytics engine developed to be compatible with Elasticsearch, a popular search engine based on Elasticsearch provided by the Elastic company. In addition to the ability to store and quickly search through the information stored in it, OpenSearch offers powerful visualization capabilities powered by OpenSearch Dashboards that let you and your team members graphically see results over time.

OpenSearch is designed to handle large amounts of data and provide rapid search results coupled with analytics capabilities. Some use cases for OpenSearch include log analytics, full-text search, and real-time application monitoring. A positive feature of Amazon OpenSearch Service is that it supports many different data types for ingestion and storage and easily integrates with other AWS services such as AWS Lambda and Kinesis.

Many security features are built into Amazon OpenSearch Service, but you should first and foremost understand how it handles data protection.

The key points and benefits to remember (especially for the exam) about OpenSearch Service are the following:

  • It’s fully managed: You can have it up and running in minutes without worrying about patching, backups, or keeping up with updates and versions.
  • You and your team have the ability to access all data: Once the data has been placed into your OpenSearch cluster, it can be searched and analyzed across datasets. You’re not limited to just the data in a particular bucket or log group.
  • It’s secure: After deploying in your VPC, you have a variety of ways to allow users to access securely, including IAM and SAML. You can even restrict access using security groups.
  • It can scale as your data grows: It only takes a few clicks or a few commands from the API to resize your cluster instantly, giving you more space and/or speed.
  • It integrates seamlessly: It ingests logs and data from AWS sources and provides auditing capabilities.