Speech to text with Amazon Transcribe – AWS Application Services for AI/ML – MLS-C01 Study Guide

Speech to text with Amazon Transcribe

In the previous section, you learned about text to speech. In this section, you will learn about speech to text and the service that provides it: Amazon Transcribe. It is an automatic speech recognition service that uses pre-trained deep learning models, which means that you do not have to train on petabytes of data to produce a model; Amazon does this for us. You just have to use the APIs that are available to transcribe audio files or video files; it supports a number of different languages and custom vocabulary too. Accuracy is the key, and through custom vocabulary, you can enhance it based on the desired domain or industry:

Figure 8.10 – Block diagram of Amazon Transcribe’s input and output

Some common uses of Amazon Transcribe include the following:

  • Real-time audio streaming and transcription
  • Transcripting pre-recorded audio files
  • Enable text searching from a media file by combining AWS Elasticsearch and Amazon Transcribe
  • Performing sentiment analysis on recorded audio files for voice helpdesk (contact center analytics)
  • Channel identification separation

Next, you will explore the benefits of Amazon Transcribe.

Exploring the benefits of Amazon Transcribe

Let’s look at some of the benefits of using Amazon Transcribe:

  • Content redaction: Customer privacy can be ensured by instructing Amazon Transcribe to identify and redact personally identifiable information (PII) from the language transcripts. You can filter unwanted words from your transcript by supplying a list of unwanted words with VocabularyFilterName and VocabularyFilterMethod, which are provided by the StratTranscriptionJob operation. For example, in financial organizations, this can be used to redact a caller’s details.
  • Language identification: It can automatically identify the most used language in an audio file and generate transcriptions. If you have several audio files, then this service will help you classify them by language.
  • Streaming transcription: You can send recorded audio files or live audio streams to Amazon Transcribe and output a stream of text in real time.
  • Custom vocabulary or customized transcription: You can use your custom vocabulary list as per your custom needs to generate accurate transcriptions.
  • Timestamp generation: If you want to build or add subtitles to your videos, then Amazon Transcribe can return the timestamp for each word or phrase from the audio.
  • Cost effectiveness: Being a managed service, there is no infrastructure cost.

Now, let’s get hands-on with Amazon Transcribe.

Getting hands-on with Amazon Transcribe

In this section, you will build a pipeline where you can integrate AWS Lambda with Amazon Transcribe to read an audio file stored in a folder in an S3 bucket, and then store the output JSON file in another S3 bucket. You will monitor the task’s progress in CloudWatch Logs too. You will use the start_transcription_job asynchronous function to start our job and you will constantly monitor the job through get_transcription_job until its status becomes COMPLETED. Let’s get started:

  1. First, create an IAM role called transcribe-demo-role for the Lambda function to execute. Ensure that it can read and write from/to S3, use Amazon Transcribe, and print the output in CloudWatch logs. Add the following policies to the IAM role:
    • AmazonS3FullAccess
    • CloudWatchFullAccess
    • AmazonTranscribeFullAccess
  2. Now, you will create a Lambda function called transcribe-lambda with our existing IAM role, transcribe-demo-role, and save it.

Please make sure you change the default timeout to a higher value in the Basic settings section of your Lambda function. I have set it to 10 min and 20 sec to avoid timeout errors. You will be using an asynchronous API call called start_transcription_job to start the task and monitor it by using the get_transcription_job API.

  • Paste the code available at https://github.com/PacktPublishing/AWS-Certified-Machine-Learning-Specialty-MLS-C01-Certification-Guide-Second-Edition/blob/main/Chapter08/Amazon%20Transcribe%20Demo/lambda_function/lambda_function.py and click on Deploy.

This should give us the following output:

Figure 8.11 – The Basic settings section of our created lambda function

  • Next, you will be creating an S3 bucket called transcribe-demo-101 and a folder called input. Create an event by going to the Properties tab of the Create event notification section. Enter the following details:
    • Name: audio-event
    • Events: All object create events
    • Prefix: input/
    • Destination: Lambda Function
    • Lambda: transcribe-lambda
  • Upload the audio file in .mp4 format to the input folder. This will trigger the Lambda function. As per the code, the output will be stored in the S3 bucket in JSON format, which you can then use to read the contents of the file.
  • Navigate to CloudWatch > CloudWatch Logs > Log groups > aws/lambda/transcribe-lambda. Choose the latest stream from the list. It will look as follows:

Figure 8.12 – The logs in a Log Stream for the specified log groups in the CloudWatch console

  • The output is saved to the S3 bucket in JSON format, as per the job name mentioned in your code (you can use the S3 getObject API to download and read it):

Figure 8.13 – The output JSON file in an S3 bucket