Connect with us

Amazon

Upgrade your Amazon Polly voices to neural with one line of code

In 2019, Amazon Polly launched neural text-to-speech (NTTS) voices in US English and UK English. Neural voices use machine learning and provide a richer, more lifelike speech quality. Since the initial launch of NTTS, Amazon Polly has extended its neural offering by adding new voices in US Spanish, Brazilian Portuguese, Australian English, Canadian French, German…

Published

on

In 2019, Amazon Polly launched neural text-to-speech (NTTS) voices in US English and UK English. Neural voices use machine learning and provide a richer, more lifelike speech quality. Since the initial launch of NTTS, Amazon Polly has extended its neural offering by adding new voices in US Spanish, Brazilian Portuguese, Australian English, Canadian French, German and Korean. Some of them also are available in a Newscaster speaking style tailored to the specific needs of publishers.

If you’ve been using the standard voices in Amazon Polly, upgrading to neural voices is easy. No matter which programming language you use, the upgrade process only requires a simple addition or modification of the Engine parameter wherever you use the SynthesizeSpeech and StartSynthesizeSpeechTask method in your code. In this post, you’ll learn about the benefits of neural voices and how to migrate your voices to NTTS.

Benefits of neural vs. standard

Because neural voices provide a more expressive, natural-sounding quality than standard, migrating to neural improves the user experience and boosts engagement.

“We rely on speech synthesis to drive dynamic narrations for our educational content,” says Paul S. Ziegler, Chief Executive Officer at Reflare. “The switch from Amazon Polly’s standard to neural voices has allowed us to create narrations that are so good as to consistently be indistinguishable from human speech to non-native speakers and to occasionally even fool native speakers.”

The following is an example of Joanna’s standard voice.

The following is an example of the same words, but using Joanna’s neural voice.

“Switching to neural voices is as easy as switching to other non-neural voices,” Ziegler says. “Since our systems were already set up to automatically generate voiceovers on the fly, implementing the changes took less than 5 minutes.”

Quick migration checklist

Not all SSML tags, Regions, and languages support neural voices. Before making the switch, use this checklist to verify that NTTS is available for your specific business needs:

  • Regional support – Verify that you’re making requests in Regions that support NTTS
  • Language and voice support – Verify that you’re making requests to voices and languages that support NTTS by checking the current list of voices and languages
  • SSML tag support – Verify that the SSML tags in your requests are supported by NTTS by checking SSML tag compatibility

Additional considerations

The following table summarizes additional considerations before you switch to NTTS.

Standard Neural
Cost $4 per million characters $16 per million characters
Free Tier 5 million characters per month 1 million characters per month
Default Sample Rate 22 kHz 24 kHz
Usage Quota Quotas in Amazon Polly

Code samples

If you’re already using Amazon Polly standard, the following samples demonstrate how to switch to neural for all SDKs. The required change is highlighted in bold.

Go:

input := &polly.SynthesizeSpeechInput{ OutputFormat: aws.String(“mp3”), Text: aws.String(“Hello World!”), VoiceId: aws.String(“Joanna”), Engine: “neural”}

Java:

SynthesizeSpeechRequest synthReq = SynthesizeSpeechRequest.builder() .text(‘Hello World!’) .voiceId(‘Joanna’) .outputFormat(‘mp3’) .engine(‘neural’) .build(); ResponseInputStream synthRes = polly.synthesizeSpeech(synthReq);

Javascript:

polly.synthesizeSpeech({ Text: “Hello World!”, OutputFormat: “mp3”, VoiceId: “Joanna”, TextType: “text”, Engine: “neural”});

.NET:

var response = client.SynthesizeSpeech(new SynthesizeSpeechRequest { Text = “Hello World!”, OutputFormat = “mp3”, VoiceId = “Joanna” Engine = “neural” });

PHP:

$result = $client->synthesizeSpeech([ ‘Text’ => ‘Hello world!’, ‘OutputFormat’ => ‘mp3, ‘VoiceId’ => ‘Joanna’, ‘Engine’ => ‘neural’]);

Python:

polly.synthesize_speech( Text=”Hello world!”, OutputFormat=”mp3″, VoiceId=”Joanna”, Engine=”neural”)

Ruby:

resp = polly.synthesize_speech({ text: “Hello World!”, output_format: “mp3”, voice_id: “Joanna”, engine: “neural” })

Conclusion

You can start playing with neural voices immediately on the Amazon Polly console. If you have any questions or concerns, please post it to the AWS Forum for Amazon Polly, or contact your AWS Support team.

About the Author

Marta Smolarek is a Senior Program Manager in the Amazon Text-to-Speech team. Outside of work, she loves to go camping with her family

Source

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Amazon

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases

This post introduces a solution to reduce hallucinations in Large Language Models (LLMs) by implementing a verified semantic cache using Amazon Bedrock Knowledge Bases, which checks if user questions match curated and verified responses before generating new answers. The solution combines the flexibility of LLMs with reliable, verified answers to improve response accuracy, reduce latency,…

Published

on

By

This post introduces a solution to reduce hallucinations in Large Language Models (LLMs) by implementing a verified semantic cache using Amazon Bedrock Knowledge Bases, which checks if user questions match curated and verified responses before generating new answers. The solution combines the flexibility of LLMs with reliable, verified answers to improve response accuracy, reduce latency, and lower costs while preventing potential misinformation in critical domains such as healthcare, finance, and legal services.

Source

Continue Reading

Amazon

Orchestrate an intelligent document processing workflow using tools in Amazon Bedrock

This intelligent document processing solution uses Amazon Bedrock FMs to orchestrate a sophisticated workflow for handling multi-page healthcare documents with mixed content types. The solution uses the FM’s tool use capabilities, accessed through the Amazon Bedrock Converse API. This enables the FMs to not just process text, but to actively engage with various external tools…

Published

on

By

This intelligent document processing solution uses Amazon Bedrock FMs to orchestrate a sophisticated workflow for handling multi-page healthcare documents with mixed content types. The solution uses the FM’s tool use capabilities, accessed through the Amazon Bedrock Converse API. This enables the FMs to not just process text, but to actively engage with various external tools and APIs to perform complex document analysis tasks.

Source

Continue Reading

Amazon

AWS and DXC collaborate to deliver customizable, near real-time voice-to-voice translation capabilities for Amazon Connect

In this post, we discuss how AWS and DXC used Amazon Connect and other AWS AI services to deliver near real-time V2V translation capabilities. Source

Published

on

By

In this post, we discuss how AWS and DXC used Amazon Connect and other AWS AI services to deliver near real-time V2V translation capabilities.

Source

Continue Reading

Trending

Copyright © 2021 Today's Digital.