Connect with us


Detect manufacturing defects in real time using Amazon Lookout for Vision

In this post, we look at how we can automate the detection of anomalies in a manufactured product using Amazon Lookout for Vision. Using Amazon Lookout for Vision, you can notify operators in real time when defects are detected, provide dashboards for monitoring the workload, and get visual insights from the process for business users.…



In this post, we look at how we can automate the detection of anomalies in a manufactured product using Amazon Lookout for Vision. Using Amazon Lookout for Vision, you can notify operators in real time when defects are detected, provide dashboards for monitoring the workload, and get visual insights from the process for business users.

Amazon Lookout for Vision is a machine learning (ML) service that spots defects and anomalies in visual representations using computer vision (CV). With Amazon Lookout for Vision, manufacturing companies can increase quality and reduce operational costs by quickly identifying differences in images of objects at scale.

Defect and anomaly detection during manufacturing processes is a vital step to ensure the quality of the products. The timely detection of faults or defects and taking appropriate actions is important to reduce operational and quality-related costs. According to Aberdeen’s research, “Many organizations will have true quality-related costs as high as 15 to 20 percent of sales revenue, in extreme cases some going as high as 40 percent.”

Manual inspection, either in-line or end-of-line, is a time-consuming and expensive task. Firstly, you require trained human experts to perform visual inspections. Secondly, the feedback loop is slower and can cause bottlenecks in production and time-to-market timelines. Lastly, the process is subjective, and difficult and costly to scale effectively.

Therefore, a robust, effective, and scalable defect detection mechanism is necessary to provide objective decisions on visual inspection with a quick feedback loop and at low cost to maximize the quality of manufactured goods.

Overview of solution

The solution is composed of seven different building blocks (as shown in the following diagram), which we dive deep into in the following sections.

Image ingestion and storage

The following section of the architecture illustrates the components for image ingestion and storage.

The images from a manufacturing facility camera can be ingested either directly by the camera, which supports compute, or via a client application that collects images from the cameras, optionally preprocesses them so that they match the image properties used to train the model, and uploads them to Amazon Simple Storage Service (Amazon S3).

We achieve this by invoking an Amazon API Gateway URL to get a presigned URL from Amazon S3. The client invokes an API Gateway REST API endpoint with request parameters, which include metadata such as assembly line ID, camera ID, image ID, and an authorization token. The API Gateway uses a custom AWS Lambda authorizer function that uses the authorization token to determine the caller identity and grant authorization. After authorization, the request is processed by a Lambda function that gets the signed URL from Amazon S3.

With the presigned URL, a client gets time-bound access to be able to upload a specific object to your S3 bucket without needing AWS security credentials or permissions.

This provides a secure and scalable pattern for uploading images for anomaly detection.

Defect detection workflow

The anomaly detection workflow relies on AWS Step Functions to orchestrate the process of detecting whether an image is anomalous, storing the inference result, and sending notifications. The following diagram illustrates this process.

Step Functions is a serverless function orchestrator that allows you to sequence Lambda functions and multiple AWS services into business-critical applications. Through its visual interface, you can create and run a series of checkpointed and event-driven workflows that maintain the application state. The output of one step acts as an input to the next. Each step in your application runs in order, as defined by your business logic.

In our case, when an image is uploaded to the S3 bucket, it triggers an event notification that in turn invokes a Lambda function to start the workflow or state machine run. The workflow consists of the following steps and Lambda functions:

  1. DetectAnomalies – Gets the image and its associated metadata from Amazon S3 and invokes the DetectAnomalies API for Amazon Lookout for Vision. It enriches the response with the metadata and passes it to the next step.
  2. PutResultInDynamoDB – Invokes the PutItem API of Amazon DynamoDB to store the response from the previous stage to a DynamoDB table.
  3. PublishMessageToSNS – Invokes the Publish API of Amazon Simple Notification Service (Amazon SNS) to publish a message to an SNS topic that notifies subscribers in case an anomaly or a low-confidence result is detected. This enables operators to get real-time alerts and notifications to take appropriate action—for example, to manually review low-confidence results, label them correctly, and feed them back into the dataset in Amazon Lookout for Vision to retrain the model.

ML model and management front end

At the core of the solution, we have Amazon Lookout for Vision, which enables us to train an ML model to spot defects and anomalies in images of objects at scale. It requires no specialized ML skills or high-cost machine vision systems or cameras to get started. It provides an easy, fast, and low-cost way to improve quality control processes and reduce operational and quality-related costs.

You can get started with as few as 30 images for the process you want to visually inspect. Amazon Lookout for Vision builds a model in minutes that you can then use to automate the visual inspection processes in real time or in batch and receive notifications when defects are detected. You can continuously improve and tune the model by adding more images and providing feedback on the identified product defects to improve precision and accuracy.

The following diagram illustrates these components of the architecture.

Moreover, you can set up a static website to act as a management front end to provide a secure and simple way for administrators to start and stop a model depending on usage requirements. The model startup and shutdown can also be automated using Amazon CloudWatch scheduled events and Lambda functions, though this is not covered in this post.


We use DynamoDB to store inference results from Amazon Lookout for Vision. DynamoDB is a NoSQL database service that delivers consistent, single-digit millisecond latency at any scale and lets you easily store and query data.

Each record that is stored in the table contains the basic information of the uploaded image such as S3 URI, camera ID, and assembly line ID. It also contains the result of the defect detection process, which includes whether an image is classified as anomalous or not along with its corresponding confidence value.

The following screenshot illustrates the DynamoDB table structure.


To gain insights from defect detection results stored in the DynamoDB table, you can transfer the results to a destination S3 bucket, optionally analyze the data using Amazon Athena, an interactive query service that analyzes data in Amazon S3 using standard SQL, and then easily build visualizations using Amazon QuickSight.

The following diagram demonstrates the high-level process.

We use DynamoDB Streams configuration to capture a time-ordered sequence of item-level modifications in the table and durably store the information for up to 24 hours. For more information, see Change Data Capture for DynamoDB Streams.

When a new record is added to the table, a new record appears in the table’s stream. Lambda polls the stream and invokes a Lambda function synchronously, which transforms the record and puts it in an Amazon Kinesis Data Firehose delivery stream. This delivery stream is configured to batch receiving records within 1 minute into a file in JSON format and put the file into the destination S3 bucket. We can then use QuickSight to import the data in the S3 bucket and create visualizations based on the data.


We use Amazon SNS, a fully managed notification service to send messages via email reliably and securely for both application-to-application and application-to-person communication. With Amazon SNS, we can notify operators and quality managers when an image is classified as anomalous or has a low-confidence inference result via SMS, mobile push, or email. Similarly, we also use Amazon SNS to notify users when a CloudWatch alarm is triggered, such as when the number of detected anomalies exceeds a predefined threshold. In this solution, we demonstrate how relevant stakeholders can receive email notifications based on anomaly detection results.

The following diagram illustrates these steps in the architecture.

Monitoring and alerting

We use CloudWatch for monitoring and observability. With CloudWatch, we can monitor image processing and anomaly detection metrics for Amazon Lookout for Vision and other services. Additionally, you can access logs generated from the Lambda function and Step Functions state machine runs, create dashboards to provide you visibility on these metrics, and create alarms and get notifications when predefined thresholds are exceeded.

The following diagram illustrates these steps in the architecture.

In the next sections, we walk you through setting up the prerequisites, deploying and testing the solution, and creating visualizations with QuickSight.


To set up prerequisites related to Amazon Lookout for Vision, refer to Setting up Amazon Lookout for Vision. Specifically, you need to set up the following:

Prepare your dataset

To prepare your dataset for model training, you can use a sample dataset, prepare a custom labeled dataset, or use a publicly available dataset.

A sample dataset is available in the repository located at ../resources/circuitboard/. Run the following command after updating the details of your S3 bucket (which you created earlier) to upload the images for training the model in Amazon Lookout for Vision:

>aws s3 cp –recursive your-repository-folder/resources/circuitboard s3://your-lookout-for-vision-bucket/custom-dataset/circuitboard/

For more information, see Step 8: (Optional) Prepare example images.

After the dataset is uploaded, you can start with project creation and model training via the Amazon Lookout for Vision console.

To prepare a custom labeled dataset, you first gather and preprocess the images, then divide the dataset into training and testing data.

For this post, we use the third option, and use a dataset from a public repository. The dataset we use is a subset of the “casting product image data for quality inspection” dataset from Kaggle. It contains labeled images of anomalous and normal metal casting products. These are 300 x 300 pixel grey-scaled images and in all images, augmentation has already been applied. To follow along with this post, you can download the dataset from Kaggle, extract the data, and move the image files so that the resulting folder structure resembles the following screenshot.

The subfolders in metal-casting-defects are as follows:

  • extra_images – Images you can use to test inference
  • test – Images you can use in a test dataset
  • train – Images you can use in a training dataset

After you set up the folder structure, run the following command via the AWS CLI to upload the dataset to Amazon S3:

>aws s3 cp –recursive your-repository-folder/resources/metal-casting-defects s3://your-lookout-for-vision-bucket/custom-dataset/metal-casting-defects/

The following screenshot illustrates the folder structure in the S3 bucket after the images are successfully uploaded.

Create a project and dataset in Amazon Lookout for Vision

In this section, we walk you through the steps for setting up an Amazon Lookout for Vision project and a dataset to train a model for anomaly detection. For more information, see Getting Started with the Amazon Lookout for Vision console or watch the videos available on Amazon Lookout for Vision Resources.

  1. On the Amazon Lookout for Vision console, choose Create project.

  1. Create a project called metal-casting-defects.
  2. On the project details page, choose Create dataset.

  1. Select Create a training dataset and a test dataset.

  1. Select Import Images from S3.

  1. Choose Copy S3 URI to get the S3 URI from the bucket created previously and append the appropriate folder name (train/ or test/).

  1. Ensure that Automatically attach labels to images based on the folder name is selected.

  1. After the configuration details are complete for the training and test dataset, choose Create dataset.

Train the model

When the dataset has been imported, you can start the model training using the default settings.

  1. On your model details page, choose Train model.

  1. Choose Train model

  1. Choose Train model

The model training takes some time depending on the size of the dataset used. If you’re using the dataset that is part of the repository, it should take 45–60 minutes to complete the training process. You can monitor the training status on the model details page.

When the model has been trained successfully, you can use it for detecting anomalies in new images.

Evaluate the model

You can evaluate whether your model is ready to be deployed to production in a few different ways. The first is to review the performance metrics of the model; the second is to run some production tests to help you verify if the model is ready to be deployed.

We use three main performance metrics: precision, recall, and F1 score. Precision measures the percentage of times the model prediction is correct, and recall measures the percentage of true defects the model identified. We use the F1 score to determine the model performance metric.

We can improve the model iteratively by retraining with new data.

To detect anomalies in the image, start your model with the StartModel operation.

After your model starts, you can use the DetectAnomalies operation to detect anomalies in an image.

Run the model using the AWS CLI

You can run the model via the AWS CLI or SDK. However, we can also set up a management front end that admin users can use to easily start and stop the model via a web user interface.

Run the following command in the terminal:

aws lookoutvision start-model –project-name “metal-casting-defects” –model-version model 1 –min-inference-units 1

The code has the following parameters:

  • project-name – The name of the project that contains the model you want to start
  • model-version – The version of the model you want to start
  • min-inference-units – The number of anomaly detection units you want to use (1–5)

Make sure to stop the model after you complete the testing so you don’t incur any additional cost. For more details about pricing, see Amazon Lookout for Vision Pricing.

Start the model using a management front end

You can also set up a management front end that admin users can use to easily start and stop the model via a web user interface. You can deploy the solution located on GitHub to set up the front-end application. After it’s deployed, you can sign up and log in to the management front end and start or stop the model. To start the model, complete the following steps:

  1. Choose Start the model.

  1. Enter the minimum number of inference units to use.
  2. Choose Start the model.

You see a message that the model is starting.

Deploy the solution

Now that all prerequisites have been set up, we can proceed with the solution deployment. The solution sample code is available on GitHub.

In this section, I show you how to launch an AWS CloudFormation template, which creates the following resources:

  • S3 buckets for the source images and inference results
  • IAM roles for the Lambda functions
  • An AWS CodeDeploy application and CodeDeploy deployment groups.
  • An Amazon API Gateway REST API to use for getting a signed URL from Amazon S3 to upload images
  • A CloudWatch monitoring dashboard and alarm
  • The following Lambda functions:
    • APIGatewayCustomAuthorizerFunction
    • CreateManifestFileFunction
    • DetectAnomaliesFunction
    • DynamoDBToFirehoseFunction
    • PublishAlertMessageToSnsTopicFunction
    • PutItemInDynamoDBFunction
    • StartStateMachineLambda
  • A Step Functions state machine that orchestrates defect detection, stores the results in DynamoDB, and publishes messages via Amazon SNS
  • An SNS topic for sending email notifications to the subscribed email address
  • A DynamoDB table to store inference results
  • The following custom resources (Lambda functions):
    • S3ToLambdaTrigger – Creates an event notification trigger on the source images S3 bucket to invoke a Lambda function to start running the state machine
    • CreateManifestFile – Creates a manifest file in the results S3 bucket to use with QuickSight for data import

Deploying the solution on CloudFormation

  1. Deploy the latest CloudFormation template by choosing ‘Launch on AWS’ for your preferred AWS Region:
US East (N. Virginia) (us-east-1)
US East (Ohio) (us-east-2)
US West (Oregon) (us-west-2)
EU (Ireland) (eu-west-1)
  1. If prompted, log in using your AWS account credentials.

On the Create stack page, the fields specifying the CloudFormation template are pre-populated.

  1. Choose Next.
  2. On the Specify stack details page, you can customize the following parameters:
    1. Stack Name – The name that is used to refer to this stack in AWS CloudFormation once deployed. The default is L4VServerlessApp.
    2. AlertsEmailAddress – The email address used for subscribing to email notifications.
    3. ResourcePrefix – AWS resources are named based on the value of this parameter. You must customize this if you’re launching more than one instance of the stack within the same account.
    4. LookoutProjectName – The Amazon Lookout for Vision project name.
    5. LookoutModelVersion –The model version of the specified project. The default is 1.
    6. ConfidenceThresholdForAlerts –The threshold value (0.00 – 1.00) for alerting on low-confidence inference results. The default is 0.20.
    7. ImageFileExtension –The extension type of images that are used for inference. Amazon Lookout for Vision supports JPEG, JPG, and PNG. The default is JPEG.
  3. Choose Next.
  4. Configure stack options if desired, then choose Next.
  5. On the review screen, select the check boxes for:
    1. I acknowledge that AWS CloudFormation might create IAM resources
    2. I acknowledge that AWS CloudFormation might create IAM resources with custom names
    3. I acknowledge that AWS CloudFormation might require the following capability: CAPABILITY_AUTO_EXPAND

These are required to allow AWS CloudFormation to create the IAM roles specified in the CloudFormation stack using both fixed and dynamic names.

  1. Choose Create Change Set.
  2. On the Change Set page, choose Execute to launch your stack.

You may need to wait for the Execution status of the changeset to show as AVAILABLE before the Execute button becomes available.

  1. Wait for the CloudFormation stack to launch.

The stack provisioning is complete when the Stack status shows as CREATE_COMPLETE.

You can monitor the stack creation progress on the Events tab.

  1. On the Outputs tab for the stack, note the API URL value.

We use this to request the Amazon S3 signed URL to upload an image when we test the solution.

  1. Check your inbox (the AlertsEmailAddress address passed as a parameter) for an email from Amazon SNS to confirm subscription.
  2. Confirm the subscription by choosing the link in the email to receive defect detection emails.

Test the solution

Now that we have all the prerequisites and the solution deployed, we can test the model using the code in /scripts/ This code simulates the process of a camera uploading an image as a product passes the inspection point.

  1. Open your terminal and change the directory to the repository folder.
  2. Install the Python requests module:

> pip3 install -t scripts/packages/ requests

  1. Run the following command to test an image upload from a client after making the appropriate changes to the parameters:

> python3 scripts/

For this post, we use the following inputs:

  • DIRECTORY – /resources/images/extra_images
  • CAMERA_ID – CAM123456
  • API_ENDPOINT – (this is an output from the CloudFormation stack provisioned previously)
  • AUTH_TOKEN – allow or deny

The following example code shows allowing authorization to upload:

> python3 scripts/ resources/images/extra_images CAM123456 ASM123456 allow 0

The following screenshot shows a successful upload of the images from the client.

The following code shows an example of denying authorization to upload:

> python3 scripts/ resources/images/extra_images CAM123456 ASM123456 deny 0

The following screenshot shows that an image upload has been explicitly denied because the client doesn’t have the required authorization token.

Alternatively, you can update the parameters in the file and run it via a terminal:

As images are uploaded to Amazon S3, the state machine starts, and you get email notifications when an image has been classified as an anomaly or if a low-confidence result is returned by the model.

Get notifications

After the email subscription has been confirmed for the SNS topic, Amazon SNS sends email messages to notify the appropriate team of detected anomalies via email.

The relevant stakeholders (such as operators or quality managers) can take appropriate actions based on the notifications. For example, they can choose to classify or grade the product, bin the product and ship, rework, scrap or recycle, investigate the process, or review the inference result and provide feedback to Amazon Lookout for Vision to retrain the model and improve inference results.

You can also use SNS topics to fan out messages to several subscriber systems, including Amazon Simple Queue Service (Amazon SQS), Lambda functions, HTTPS endpoints, and Kinesis Data Firehose.

The following screenshot shows an example of a defect detection alert.

The following screenshot shows an example of a low-confidence inference result email.

Monitor the model

You can use the Amazon Lookout for Vision dashboard to visualize the total images processed, anomalies detected, and anomaly ratio, as in the following screenshot.

Monitor the application

The solution also creates a CloudWatch dashboard to provide a single pane of glass to monitor the serverless application. Specifically, the dashboard provides metrics related to Amazon Lookout for Vision image processing and the metrics related to anomaly detection workflow. You can enhance the dashboard based on your requirements to add further alarms and widgets.

The following screenshots illustrate some of the widgets created as part of the solution.

Visualize insights using QuickSight

You can use QuickSight to build visualizations based on the inference results stored in Amazon S3. In this section, we walk you through setting up QuickSight as a first-time user or existing user, then creating your dataset and visuals.

First-time user

If you’re using QuickSight for the first time in your account, you’re asked to sign up for the service before being able to use it.

  1. Choose Sign up for QuickSight.

  1. Select your preferred edition (the Standard edition is sufficient for this post).

  1. Choose your preferred Region.
  2. Enter a unique QuickSight account name and an email address to receive the notifications.
  3. Select Amazon S3 so QuickSight can auto-discover S3 buckets.

  1. In the pop-up window, select the defects-results S3 bucket, then choose Finish.

  1. Choose Finish to create your QuickSight account.

You can skip the next section and proceed with creating your dataset.

Existing user

If you’re an existing QuickSight user, you need to grant Quicksight access to the destination S3 bucket.

  1. On the QuickSight console, on the user drop-down menu, choose Manage QuickSight.

  1. In the navigation pane, choose Security & Permissions.
  2. Under QuickSight access to AWS Services, choose Add or remove.

  1. Choose Select S3 buckets.
  2. Select the defects-results bucket.

  1. Choose Finish to grant QuickSight access to your S3 bucket.

Create your dataset

To get started creating visualizations, you first need to create a dataset. When you import data into a dataset, it becomes SPICE data because of how it’s stored. SPICE is the QuickSight Super-fast, Parallel, In-memory Calculation Engine. It’s engineered to rapidly perform advanced calculations and serve data.

To create a dataset from Amazon S3, you need a manifest that QuickSight can use to identify the files that you want to use and the upload settings needed to import them.

In the solution that we deployed, the defects-results S3 bucket also stores a manifest file created as part of the stack provisioning using a Lambda-backed custom resource. You can use the S3 URI of this manifest file in QuickSight to import the data in the bucket. To create a dataset and specify the data in the defects-results S3 bucket as a data source, complete the following steps:

  1. On the QuickSight console, choose Datasets in the navigation pane.
  2. Choose New dataset.

  1. For Create a Dataset, choose S3.

  1. For Data source name, enter a name.
  2. For Upload a manifest file, select URL.
  3. Enter the URL of the JSON manifest file created as part of the stack provisioning.

You can locate the URL on the Outputs tab of the CloudFormation stack.

  1. Choose Connect.

Create visuals

A visual is a graphical representation of your data. You can create a wide variety of visuals in an analysis, using different datasets and visual types. The following screenshots show a few sample visuals. For more information, see Creating an Amazon QuickSight Visual.

The following bar charts show anomalous vs. non-anomalous results based on CameraID and AssemblyLineID.

The following charts show the composition of overall records based on CameraID and AssemblyLineID.

The following line charts demonstrate inference results over a period of time.

We can see the distribution of confidence results using calculated fields. For example, we calculate the confidence level field using the following formula:

ifelse({Confidence}>=0.90,”VERY HIGH”,({Confidence}>0.70 AND {Confidence}<0.90),"HIGH",{Confidence}>=0.5 AND {Confidence}<=0.7,"MEDIUM",{Confidence}>=0.2 AND {Confidence}<0.5,"LOW","VERY LOW")

The following charts show the confidence levels and the distribution of confidence across inference results.

Clean up

To clean up the resources provisioned as part of the solution, carry out the following steps:

  1. Make sure that the source and defects-results S3 buckets are empty. You can either empty the buckets via the Amazon S3 console or move the objects to another bucket.
  2. On the AWS CloudFormation console, choose the LookoutVisionApp project then right-click and select “Delete Stack”.

The stack takes time to delete; you can track its progress on the Events tab. When the stack deletion is complete, the status changes from DELETE_IN_PROGRESS to DELETE_COMPLETE. The stack then disappears from the list.

  1. Delete the management front-end stack – on the AWS CloudFormation console, choose the LookoutVisionDemo project then right-click and select “Delete Stack”.
  2. Delete the QuickSight dashboard, analysis, and dataset. See the following links for the steps:
    1. Deleting a Dashboard
    2. Deleting an Analysis
    3. Deleting a Dataset

Additional considerations

Amazon Lookout for Vision supports inferencing in the cloud and therefore you need to evaluate your network availability, bandwidth, and latency requirements accordingly. For inferencing at the edge, you can explore AWS Panorama and AWS IoT Greengrass.

Amazon Lookout for Vision provides a direct integration with Amazon SageMaker Ground Truth, which you can use to automate image labeling. For more information, see Automate Data Labeling and Creating a dataset using an Amazon SageMaker Ground Truth manifest file.

You can also set up human review workflows to inspect low-confidence results using Amazon Augmented AI (Amazon A2I).


In this post, we looked at how to use Amazon Lookout for Vision and combine it with other serverless services to automate defect detection for manufactured products, alert operators or quality managers in real time when a defect is detected, and generate visual insights for business users.

Amazon Lookout for Vision allows customers in the manufacturing domain to set up a low-cost solution for improving quality and reducing operational costs without any specialized ML expertise.

To learn more about Amazon Lookout for Vision, see Amazon Lookout for Vision Documentation. For pricing, refer to Amazon Lookout for Vision Pricing.

About the Authors

Mohsin Khan is a Solutions Architect at AWS, based in Manchester, UK. He is passionate about helping customers achieve success on their cloud journeys, enjoys designing solutions with serverless technologies and has a developing interest in machine learning and AI. Apart from work, he likes reading history and watching sports.



Amir Khairalomoum is a Solutions Architect at AWS, based in London, UK. He supports customers in their digital transformation and their cloud journey to AWS. He is passionate about serverless technologies. Outside of work, he loves reading, biking, and traveling.




Ibtehaj Ahmed is a Solutions Architect at AWS, based in London, UK. He loves to help early cloud adopters set up for success and utilize the right technologies. He is passionate about mobile development and purpose-built databases. Outside of work, he plays football regularly and enjoys participating in other sports.


Continue Reading
Click to comment

Leave a Reply

Your email address will not be published.


Customize pronunciation using lexicons in Amazon Polly

Amazon Polly is a text-to-speech service that uses advanced deep learning technologies to synthesize natural-sounding human speech. It is used in a variety of use cases, such as contact center systems, delivering conversational user experiences with human-like voices for automated real-time status check, automated account and billing inquiries, and by news agencies like The Washington…




Amazon Polly is a text-to-speech service that uses advanced deep learning technologies to synthesize natural-sounding human speech. It is used in a variety of use cases, such as contact center systems, delivering conversational user experiences with human-like voices for automated real-time status check, automated account and billing inquiries, and by news agencies like The Washington Post to allow readers to listen to news articles.

As of today, Amazon Polly provides over 60 voices in 30+ language variants. Amazon Polly also uses context to pronounce certain words differently based upon the verb tense and other contextual information. For example, “read” in “I read a book” (present tense) and “I will read a book” (future tense) is pronounced differently.

However, in some situations you may want to customize the way Amazon Polly pronounces a word. For example, you may need to match the pronunciation with local dialect or vernacular. Names of things (e.g., Tomato can be pronounced as tom-ah-to or tom-ay-to), people, streets, or places are often pronounced in many different ways.

In this post, we demonstrate how you can leverage lexicons for creating custom pronunciations. You can apply lexicons for use cases such as publishing, education, or call centers.

Customize pronunciation using SSML tag

Let’s say you stream a popular podcast from Australia and you use the Amazon Polly Australian English (Olivia) voice to convert your script into human-like speech. In one of your scripts, you want to use words that are unknown to Amazon Polly voice. For example, you want to send Mātariki (Māori New Year) greetings to your New Zealand listeners. For such scenarios, Amazon Polly supports phonetic pronunciation, which you can use to achieve a pronunciation that is close to the correct pronunciation in the foreign language.

You can use the Speech Synthesis Markup Language (SSML) tag to suggest a phonetic pronunciation in the ph attribute. Let me show you how you can use SSML tag.

First, login into your AWS console and search for Amazon Polly in the search bar at the top. Select Amazon Polly and then choose Try Polly button.

In the Amazon Polly console, select Australian English from the language dropdown and enter following text in the Input text box and then click on Listen to test the pronunciation.

I’m wishing you all a very Happy Mātariki.

Sample speech without applying phonetic pronunciation:

If you hear the sample speech above, you can notice that the pronunciation of Mātariki – a word which is not part of Australian English – isn’t quite spot-on. Now, let’s look at how in such scenarios we can use phonetic pronunciation using SSML tag to customize the speech produced by Amazon Polly.

To use SSML tags, turn ON the SSML option in Amazon Polly console. Then copy and paste following SSML script containing phonetic pronunciation for Mātariki specified inside the ph attribute of the tag.

I’m wishing you all a very Happy Mātariki.

With the tag, Amazon Polly uses the pronunciation specified by the ph attribute instead of the standard pronunciation associated by default with the language used by the selected voice.

Sample speech after applying phonetic pronunciation:

If you hear the sample sound, you’ll notice that we opted for a different pronunciation for some of vowels (e.g., ā) to make Amazon Polly synthesize the sounds that are closer to the correct pronunciation. Now you might have a question, how do I generate the phonetic transcription “” for the word Mātariki?

You can create phonetic transcriptions by referring to the Phoneme and Viseme tables for the supported languages. In the example above we have used the phonemes for Australian English.

Amazon Polly offers support in two phonetic alphabets: IPA and X-Sampa. Benefit of X-Sampa is that they are standard ASCII characters, so it is easier to type the phonetic transcription with a normal keyboard. You can use either of IPA or X-Sampa to generate your transcriptions, but make sure to stay consistent with your choice, especially when you use a lexicon file which we’ll cover in the next section.

Each phoneme in the phoneme table represents a speech sound. The bolded letters in the “Example” column of the Phoneme/Viseme table in the Australian English page linked above represent the part of the word the “Phoneme” corresponds to. For example, the phoneme /j/ represents the sound that an Australian English speaker makes when pronouncing the letter “y” in “yes.”

Customize pronunciation using lexicons

Phoneme tags are suitable for one-off situations to customize isolated cases, but these are not scalable. If you process huge volume of text, managed by different editors and reviewers, we recommend using lexicons. Using lexicons, you can achieve consistency in adding custom pronunciations and simultaneously reduce manual effort of inserting phoneme tags into the script.

A good practice is that after you test the custom pronunciation on the Amazon Polly console using the tag, you create a library of customized pronunciations using lexicons. Once lexicons file is uploaded, Amazon Polly will automatically apply phonetic pronunciations specified in the lexicons file and eliminate the need to manually provide a tag.

Create a lexicon file

A lexicon file contains the mapping between words and their phonetic pronunciations. Pronunciation Lexicon Specification (PLS) is a W3C recommendation for specifying interoperable pronunciation information. The following is an example PLS document:

Matariki Mātariki NZ New Zealand

Make sure that you use correct value for the xml:lang field. Use en-AU if you’re uploading the lexicon file to use with the Amazon Polly Australian English voice. For a complete list of supported languages, refer to Languages Supported by Amazon Polly.

To specify a custom pronunciation, you need to add a element which is a container for a lexical entry with one or more element and one or more pronunciation information provided inside element.

The element contains the text describing the orthography of the element. You can use a element to specify the word whose pronunciation you want to customize. You can add multiple elements to specify all word variations, for example with or without macrons. The element is case-sensitive, and during speech synthesis Amazon Polly string matches the words inside your script that you’re converting to speech. If a match is found, it uses the element, which describes how the is pronounced to generate phonetic transcription.

You can also use for commonly used abbreviations. In the preceding example of a lexicon file, NZ is used as an alias for New Zealand. This means that whenever Amazon Polly comes across “NZ” (with matching case) in the body of the text, it’ll read those two letters as “New Zealand”.

For more information on lexicon file format, see Pronunciation Lexicon Specification (PLS) Version 1.0 on the W3C website.

You can save a lexicon file with as a .pls or .xml file before uploading it to Amazon Polly.

Upload and apply the lexicon file

Upload your lexicon file to Amazon Polly using the following instructions:

  1. On the Amazon Polly console, choose Lexicons in the navigation pane.
  2. Choose Upload lexicon.
  3. Enter a name for the lexicon and then choose a lexicon file.
  4. Choose the file to upload.
  5. Choose Upload lexicon.

If a lexicon by the same name (whether a .pls or .xml file) already exists, uploading the lexicon overwrites the existing lexicon.

Now you can apply the lexicon to customize pronunciation.

  1. Choose Text-to-Speech in the navigation pane.
  2. Expand Additional settings.
  3. Turn on Customize pronunciation.
  4. Choose the lexicon on the drop-down menu.

You can also choose Upload lexicon to upload a new lexicon file (or a new version).

It’s a good practice to version control the lexicon file in a source code repository. Keeping the custom pronunciations in a lexicon file ensures that you can consistently refer to phonetic pronunciations for certain words across the organization. Also, keep in mind the pronunciation lexicon limits mentioned on Quotas in Amazon Polly page.

Test the pronunciation after applying the lexicon

Let’s perform quick test using “Wishing all my listeners in NZ, a very Happy Mātariki” as the input text.

We can compare the audio files before and after applying the lexicon.

Before applying the lexicon:

After applying the lexicon:


In this post, we discussed how you can customize pronunciations of commonly used acronyms or words not found in the selected language in Amazon Polly. You can use SSML tag which is great for inserting one-off customizations or testing purposes. We recommend using Lexicon to create a consistent set of pronunciations for frequently used words across your organization. This enables your content writers to spend time on writing instead of the tedious task of adding phonetic pronunciations in the script repetitively. You can try this in your AWS account on the Amazon Polly console.

Summary of resources

About the Authors

Ratan Kumar is a Solutions Architect based out of Auckland, New Zealand. He works with large enterprise customers helping them design and build secure, cost-effective, and reliable internet scale applications using the AWS cloud. He is passionate about technology and likes sharing knowledge through blog posts and twitch sessions.

Maciek Tegi is a Principal Audio Designer and a Product Manager for Polly Brand Voices. He has worked in professional capacity in the tech industry, movies, commercials and game localization. In 2013, he was the first audio engineer hired to the Alexa Text-To- Speech team. Maciek was involved in releasing 12 Alexa TTS voices across different countries, over 20 Polly voices, and 4 Alexa celebrity voices. Maciek is a triathlete, and an avid acoustic guitar player.


Continue Reading


AWS Week in Review – May 16, 2022

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS! I had been on the road for the last five weeks and attended many of the AWS Summits in Europe. It was great to talk to so many of you…




This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

I had been on the road for the last five weeks and attended many of the AWS Summits in Europe. It was great to talk to so many of you in person. The Serverless Developer Advocates are going around many of the AWS Summits with the Serverlesspresso booth. If you attend an event that has the booth, say “Hi ” to my colleagues, and have a coffee while asking all your serverless questions. You can find all the upcoming AWS Summits in the events section at the end of this post.

Last week’s launches
Here are some launches that got my attention during the previous week.

AWS Step Functions announced a new console experience to debug your state machine executions – Now you can opt-in to the new console experience of Step Functions, which makes it easier to analyze, debug, and optimize Standard Workflows. The new page allows you to inspect executions using three different views: graph, table, and event view, and add many new features to enhance the navigation and analysis of the executions. To learn about all the features and how to use them, read Ben’s blog post.

Example on how the Graph View looks

Example on how the Graph View looks

AWS Lambda now supports Node.js 16.x runtime – Now you can start using the Node.js 16 runtime when you create a new function or update your existing functions to use it. You can also use the new container image base that supports this runtime. To learn more about this launch, check Dan’s blog post.

AWS Amplify announces its Android library designed for Kotlin – The Amplify Android library has been rewritten for Kotlin, and now it is available in preview. This new library provides better debugging capacities and visibility into underlying state management. And it is also using the new AWS SDK for Kotlin that was released last year in preview. Read the What’s New post for more information.

Three new APIs for batch data retrieval in AWS IoT SiteWise – With this new launch AWS IoT SiteWise now supports batch data retrieval from multiple asset properties. The new APIs allow you to retrieve current values, historical values, and aggregated values. Read the What’s New post to learn how you can start using the new APIs.

AWS Secrets Manager now publishes secret usage metrics to Amazon CloudWatch – This launch is very useful to see the number of secrets in your account and set alarms for any unexpected increase or decrease in the number of secrets. Read the documentation on Monitoring Secrets Manager with Amazon CloudWatch for more information.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other launches and news that you may have missed:

IBM signed a deal with AWS to offer its software portfolio as a service on AWS. This allows customers using AWS to access IBM software for automation, data and artificial intelligence, and security that is built on Red Hat OpenShift Service on AWS.

Podcast Charlas Técnicas de AWS – If you understand Spanish, this podcast is for you. Podcast Charlas Técnicas is one of the official AWS podcasts in Spanish. This week’s episode introduces you to Amazon DynamoDB and shares stories on how different customers use this database service. You can listen to all the episodes directly from your favorite podcast app or the podcast web page.

AWS Open Source News and Updates – Ricardo Sueiras, my colleague from the AWS Developer Relation team, runs this newsletter. It brings you all the latest open-source projects, posts, and more. Read edition #112 here.

Upcoming AWS Events
It’s AWS Summits season and here are some virtual and in-person events that might be close to you:

You can register for re:MARS to get fresh ideas on topics such as machine learning, automation, robotics, and space. The conference will be in person in Las Vegas, June 21–24.

That’s all for this week. Check back next Monday for another Week in Review!

— Marcia


Continue Reading


Personalize your machine translation results by using fuzzy matching with Amazon Translate

A person’s vernacular is part of the characteristics that make them unique. There are often countless different ways to express one specific idea. When a firm communicates with their customers, it’s critical that the message is delivered in a way that best represents the information they’re trying to convey. This becomes even more important when…




A person’s vernacular is part of the characteristics that make them unique. There are often countless different ways to express one specific idea. When a firm communicates with their customers, it’s critical that the message is delivered in a way that best represents the information they’re trying to convey. This becomes even more important when it comes to professional language translation. Customers of translation systems and services expect accurate and highly customized outputs. To achieve this, they often reuse previous translation outputs—called translation memory (TM)—and compare them to new input text. In computer-assisted translation, this technique is known as fuzzy matching. The primary function of fuzzy matching is to assist the translator by speeding up the translation process. When an exact match can’t be found in the TM database for the text being translated, translation management systems (TMSs) often have the option to search for a match that is less than exact. Potential matches are provided to the translator as additional input for final translation. Translators who enhance their workflow with machine translation capabilities such as Amazon Translate often expect fuzzy matching data to be used as part of the automated translation solution.

In this post, you learn how to customize output from Amazon Translate according to translation memory fuzzy match quality scores.

Translation Quality Match

The XML Localization Interchange File Format (XLIFF) standard is often used as a data exchange format between TMSs and Amazon Translate. XLIFF files produced by TMSs include source and target text data along with match quality scores based on the available TM. These scores—usually expressed as a percentage—indicate how close the translation memory is to the text being translated.

Some customers with very strict requirements only want machine translation to be used when match quality scores are below a certain threshold. Beyond this threshold, they expect their own translation memory to take precedence. Translators often need to apply these preferences manually either within their TMS or by altering the text data. This flow is illustrated in the following diagram. The machine translation system processes the translation data—text and fuzzy match scores— which is then reviewed and manually edited by translators, based on their desired quality thresholds. Applying thresholds as part of the machine translation step allows you to remove these manual steps, which improves efficiency and optimizes cost.

Machine Translation Review Flow

Figure 1: Machine Translation Review Flow

The solution presented in this post allows you to enforce rules based on match quality score thresholds to drive whether a given input text should be machine translated by Amazon Translate or not. When not machine translated, the resulting text is left to the discretion of the translators reviewing the final output.

Solution Architecture

The solution architecture illustrated in Figure 2 leverages the following services:

  • Amazon Simple Storage Service – Amazon S3 buckets contain the following content:
    • Fuzzy match threshold configuration files
    • Source text to be translated
    • Amazon Translate input and output data locations
  • AWS Systems Manager – We use Parameter Store parameters to store match quality threshold configuration values
  • AWS Lambda – We use two Lambda functions:
    • One function preprocesses the quality match threshold configuration files and persists the data into Parameter Store
    • One function automatically creates the asynchronous translation jobs
  • Amazon Simple Queue Service – An Amazon SQS queue triggers the translation flow as a result of new files coming into the source bucket

Solution Architecture Diagram

Figure 2: Solution Architecture

You first set up quality thresholds for your translation jobs by editing a configuration file and uploading it into the fuzzy match threshold configuration S3 bucket. The following is a sample configuration in CSV format. We chose CSV for simplicity, although you can use any format. Each line represents a threshold to be applied to either a specific translation job or as a default value to any job.

default, 75 SourceMT-Test, 80

The specifications of the configuration file are as follows:

  • Column 1 should be populated with the name of the XLIFF file—without extension—provided to the Amazon Translate job as input data.
  • Column 2 should be populated with the quality match percentage threshold. For any score below this value, machine translation is used.
  • For all XLIFF files whose name doesn’t match any name listed in the configuration file, the default threshold is used—the line with the keyword default set in Column 1.

Auto-generated parameter in Systems Manager Parameter Store

Figure 3: Auto-generated parameter in Systems Manager Parameter Store

When a new file is uploaded, Amazon S3 triggers the Lambda function in charge of processing the parameters. This function reads and stores the threshold parameters into Parameter Store for future usage. Using Parameter Store avoids performing redundant Amazon S3 GET requests each time a new translation job is initiated. The sample configuration file produces the parameter tags shown in the following screenshot.

The job initialization Lambda function uses these parameters to preprocess the data prior to invoking Amazon Translate. We use an English-to-Spanish translation XLIFF input file, as shown in the following code. It contains the initial text to be translated, broken down into what is referred to as segments, represented in the source tags.

Consent Form CONSENT FORM FORMULARIO DE CONSENTIMIENTO Screening Visit: Screening Visit Selección

The source text has been pre-matched with the translation memory beforehand. The data contains potential translation alternatives—represented as tags—alongside a match quality attribute, expressed as a percentage. The business rule is as follows:

  • Segments received with alternative translations and a match quality below the threshold are untouched or empty. This signals to Amazon Translate that they must be translated.
  • Segments received with alternative translations with a match quality above the threshold are pre-populated with the suggested target text. Amazon Translate skips those segments.

Let’s assume the quality match threshold configured for this job is 80%. The first segment with 99% match quality isn’t machine translated, whereas the second segment is, because its match quality is below the defined threshold. In this configuration, Amazon Translate produces the following output:

Consent Form FORMULARIO DE CONSENTIMIENTO CONSENT FORM FORMULARIO DE CONSENTIMIENTO Screening Visit: Visita de selección Screening Visit Selección

In the second segment, Amazon Translate overwrites the target text initially suggested (Selección) with a higher quality translation: Visita de selección.

One possible extension to this use case could be to reuse the translated output and create our own translation memory. Amazon Translate supports customization of machine translation using translation memory thanks to the parallel data feature. Text segments previously machine translated due to their initial low-quality score could then be reused in new translation projects.

In the following sections, we walk you through the process of deploying and testing this solution. You use AWS CloudFormation scripts and data samples to launch an asynchronous translation job personalized with a configurable quality match threshold.


For this walkthrough, you must have an AWS account. If you don’t have an account yet, you can create and activate one.

Launch AWS CloudFormation stack

  1. Choose Launch Stack:
  2. For Stack name, enter a name.
  3. For ConfigBucketName, enter the S3 bucket containing the threshold configuration files.
  4. For ParameterStoreRoot, enter the root path of the parameters created by the parameters processing Lambda function.
  5. For QueueName, enter the SQS queue that you create to post new file notifications from the source bucket to the job initialization Lambda function. This is the function that reads the configuration file.
  6. For SourceBucketName, enter the S3 bucket containing the XLIFF files to be translated. If you prefer to use a preexisting bucket, you need to change the value of the CreateSourceBucket parameter to No.
  7. For WorkingBucketName, enter the S3 bucket Amazon Translate uses for input and output data.
  8. Choose Next.

    Figure 4: CloudFormation stack details

  9. Optionally on the Stack Options page, add key names and values for the tags you may want to assign to the resources about to be created.
  10. Choose Next.
  11. On the Review page, select I acknowledge that this template might cause AWS CloudFormation to create IAM resources.
  12. Review the other settings, then choose Create stack.

AWS CloudFormation takes several minutes to create the resources on your behalf. You can watch the progress on the Events tab on the AWS CloudFormation console. When the stack has been created, you can see a CREATE_COMPLETE message in the Status column on the Overview tab.

Test the solution

Let’s go through a simple example.

  1. Download the following sample data.
  2. Unzip the content.

There should be two files: an .xlf file in XLIFF format, and a threshold configuration file with .cfg as the extension. The following is an excerpt of the XLIFF file.

English to French sample file extract

Figure 5: English to French sample file extract

  1. On the Amazon S3 console, upload the quality threshold configuration file into the configuration bucket you specified earlier.

The value set for test_En_to_Fr is 75%. You should be able to see the parameters on the Systems Manager console in the Parameter Store section.

  1. Still on the Amazon S3 console, upload the .xlf file into the S3 bucket you configured as source. Make sure the file is under a folder named translate (for example, /translate/test_En_to_Fr.xlf).

This starts the translation flow.

  1. Open the Amazon Translate console.

A new job should appear with a status of In Progress.

Auto-generated parameter in Systems Manager Parameter Store

Figure 6: In progress translation jobs on Amazon Translate console

  1. Once the job is complete, click into the job’s link and consult the output. All segments should have been translated.

All segments should have been translated. In the translated XLIFF file, look for segments with additional attributes named lscustom:match-quality, as shown in the following screenshot. These custom attributes identify segments where suggested translation was retained based on score.

Custom attributes identifying segments where suggested translation was retained based on score

Figure 7: Custom attributes identifying segments where suggested translation was retained based on score

These were derived from the translation memory according to the quality threshold. All other segments were machine translated.

You have now deployed and tested an automated asynchronous translation job assistant that enforces configurable translation memory match quality thresholds. Great job!


If you deployed the solution into your account, don’t forget to delete the CloudFormation stack to avoid any unexpected cost. You need to empty the S3 buckets manually beforehand.


In this post, you learned how to customize your Amazon Translate translation jobs based on standard XLIFF fuzzy matching quality metrics. With this solution, you can greatly reduce the manual labor involved in reviewing machine translated text while also optimizing your usage of Amazon Translate. You can also extend the solution with data ingestion automation and workflow orchestration capabilities, as described in Speed Up Translation Jobs with a Fully Automated Translation System Assistant.

About the Authors

Narcisse Zekpa is a Solutions Architect based in Boston. He helps customers in the Northeast U.S. accelerate their adoption of the AWS Cloud, by providing architectural guidelines, design innovative, and scalable solutions. When Narcisse is not building, he enjoys spending time with his family, traveling, cooking, and playing basketball.

Dimitri Restaino is a Solutions Architect at AWS, based out of Brooklyn, New York. He works primarily with Healthcare and Financial Services companies in the North East, helping to design innovative and creative solutions to best serve their customers. Coming from a software development background, he is excited by the new possibilities that serverless technology can bring to the world. Outside of work, he loves to hike and explore the NYC food scene.


Continue Reading


Copyright © 2021 Today's Digital.