Getting hands-on with Amazon Polly – AWS Application Services for AI/ML – MLS-C01 Study Guide

Getting hands-on with Amazon Polly

In this section, you will build a pipeline where you can integrate AWS Lambda with Amazon Polly. The pipeline reads a text file and generates an MP3 file, saving it to another folder in the same bucket. You will monitor the task’s progress in CloudWatch logs.

You will begin by creating an IAM role for Lambda. Let’s get started:

  1. Navigate to the IAM console page.
  2. Select Roles from the left-hand menu.
  3. Select Create role.
  4. Select Lambda as the trusted entity.
  5. Add the following managed policies:
    • AmazonS3FullAccess
    • AmazonPollyFullAccess
    • CloudWatchFullAccess
  6. Save the role as polly-lambda-role.

Next, you will create a Lambda function:

  • Navigate to Lambda > Functions > Create Function.
    • Name the function polly-lambda
    • Set the runtime to python 3.6.
    • Use an existing role; that is, polly-lambda-role.
  • Paste the code at https://github.com/PacktPublishing/AWS-Certified-Machine-Learning-Specialty-MLS-C01-Certification-Guide-Second-Edition/tree/main/Chapter08/Amazon%20Rekognition%20Demo/lambda_code into your Lambda function and check its progress in the CloudWatch console. You will be using the start_speech_synthesis_task API from Amazon Polly for this code; it is an asynchronous synthesis task.
  • Scroll down and in the Basic Settings section, change Timeout to 59 sec, as shown in Figure 8.6, and click Save:

Important note

The default is 3 seconds. Since this is an asynchronous operation, any retried attempts will create more files.

Figure 8.6 – Edit basic settings window

Now, you will create a bucket to trigger an event.

  1. Navigate to the AWS S3 console and create a bucket called polly-test-baba.
  2. Create a folder called input-text (in this example, you will only upload .txt files).
  3. Navigate to Properties > Events > Add notification. Fill in the required fields, as shown here, and click on Save:
    1. Name: polly_event
    1. Events: All object create events
    1. Prefix: input-text/
    1. Suffix: .txt
    1. Send to: Lambda Function
    1. Lambda: polly-lambda
  4. Next, you will upload a file to trigger an event and check its progress in CloudWatchUpload, in this case, a file called test_file.txt in input-text, as shown in Figure 8.7. You can download the sample file from this book’s GitHub repository at https://github.com/PacktPublishing/AWS-Certified-Machine-Learning-Specialty-MLS-C01-Certification-Guide-Second-Edition/tree/main/Chapter08/Amazon%20Polly%20Demo/text_file:

Figure 8.7 – The S3 bucket after uploading a text file for further processing

  1. This will trigger the Lambda function. You can monitor your logs by going to CloudWatch> CloudWatch Logs> Log groups> /aws/lambda/polly-lambda.
  2. Click on the latest stream; the log will look as follows:

File Content:  Hello Everyone, Welcome to Dublin.

How

are you doing today?

{‘ResponseMetadata’: {‘RequestId’: ’74ca4afd-5844-

47d8-9664-3660a26965e4′, ‘HTTPStatusCode’: 200,

‘HTTPHeaders’: {‘x-amzn-requestid’: ’74ca4afd-5844-

47d8-9664-3660a26965e4′, ‘content-type’:

‘application/json’, ‘content-length’: ‘471’, ‘date’:

‘Thu, 24 Sep 2020 18:50:57 GMT’}, ‘RetryAttempts’: 0},

‘SynthesisTask’: {‘Engine’: ‘standard’, ‘TaskId’:

‘57548c6b-d21a-4885-962f-450952569dc7’, ‘TaskStatus’:

‘scheduled’, ‘OutputUri’: ‘https://s3.us-east-

1.amazonaws.com/polly-test-baba/output-

audio/.57548c6b-d21a-4885-962f-450952569dc7.mp3′,

‘CreationTime’: datetime.datetime(2020, 9, 24, 18, 50,

57, 769000, tzinfo=tzlocal()), ‘RequestCharacters’:

59, ‘OutputFormat’: ‘mp3’, ‘TextType’: ‘text’,

‘VoiceId’: ‘Aditi’, ‘LanguageCode’: ‘en-GB’}}

The logs sample is shown in Figure 8.8:

Figure 8.8 – The logs in the CloudWatch console

  1. It will create output in MP3 format, as shown in Figure 8.9. Download and listen to it:

Figure 8.9 – The output file that was created in the S3 bucket

Important note

The most scalable and cost-effective way for your mobile apps or web apps is to generate an AWS pre-signed URL for S3 buckets and provide it to your users. These S3 Put events asynchronously invoke downstream AI workflows to generate results and send a response to the end users. Many users can be served at the same time through this approach, and it may increase performance and throughput.

In this section, you learned how to implement text to speech. In the next section, you will learn about Amazon Transcribe, a speech-to-text AI service.