Text-to-speech adapter for Amazon Polly #824

NPavie · 2025-05-20T11:04:13Z

This PR integrates the text-to-speech adapter for Amazon Polly engines into the DAISY Pipeline 2 modules tree.

Original code was authored by @mmartida and provided by ONCE to the DAISY Pipeline team.
The adapter was then updated for the latest version of Daisy Pipeline 2 and refactored to match similar adapters structure (mainly azure and google TTS adapters).

The adapter has also been updated to allow the use of generative, long-form, or neural voices if available for the user's AWS account.

The following properties are used by the adapter :

org.daisy.pipeline.tts.aws.accesskey (mandatory) : Access key to connect to Amazon Web Service.
org.daisy.pipeline.tts.aws.secretkey (mandatory) : Secret key to connect to Amazon Web Service.
org.daisy.pipeline.tts.aws.region (mandatory) : Region associated with your Amazon Web Service account.
org.daisy.pipeline.tts.aws.priority (defaults to 15) : priority of usage within the pipeline if other text engine are available

The adapter relies on AWS java Polly client library. All dependencies required for its usage are added to the modules/bom project.

In case it would be preferable to release the adapter on maven before including it's first release version, this PR :

Does not register the adapter itself in the modules/bom
Does not include the aforementioned dependencies and the new adapter in the current assembly project

(@bertfrees I'll leave you the squashing and rephrase for the pipeline history depending on how you want it merged in)

bertfrees · 2025-06-17T14:21:39Z

@NPavie I'm finally having a look at your PR. I have rebased the branch onto master and did some minor tweaks to the code formatting, and commit messages. I also removed some commented out code. These all seemed to stem from the original code (which is still available on the polly branch).

Incidentally, I have also rebased the original branch to polly-rebased, and am checking whether there could be some useful stuff that isn't in this PR.

First thing I notice (apart from all the changes in that branch not related to the AWS adapter) is that there were a lot of tests. I haven't looked at them yet. Were they not useful you think?

I will have some more remarks on specific pieces of code.

...apters/tts-adapter-aws/src/main/java/org/daisy/pipeline/tts/aws/impl/AWSPollyTTSService.java

...dapters/tts-adapter-aws/src/main/java/org/daisy/pipeline/tts/aws/impl/AWSPollyTTSEngine.java

bertfrees · 2025-06-17T14:33:52Z

...dapters/tts-adapter-aws/src/main/java/org/daisy/pipeline/tts/aws/impl/AWSPollyTTSEngine.java

+						awsVoicesDescription.get(engine).voices()
+							.stream()
+							.map(i -> new Voice("aws", i.idAsString() + " (" + engine.toString() + ")",
+							                    new Locale(i.languageName()),


In other places locales are parsed like this:

(new Locale.Builder()).setLanguageTag(info.getLocale().replace("_", "-")).build()

In the original code there was some language normalisation going on.

arb was replaced by ar

cmn-CN was replaced by zh_CN

(Originally this transformation was done in VoiceInfo.tagToLocale)(, not in TTSEngine.getAvailableVoices(), but the result should be the same.)

I wonder whether this normalisation is still needed. Note that we have the org.daisy.pipeline.common.NormalizeLang class (in common-utils) which could in theory be used for it.

The original code was provided by ONCE. The adapter was then updated for the latest version of Pipeline and refactored to match the structure of other adapters (notably the Azure and Google ones). The adapter was also updated to allow for the use of generative, long-form, or neural voices if available for the user's AWS account. The main changes compared to original code from ONCE are: - Updated the service and engine classes to mimic Azure adapter. - Added the following configuration properties for connecting to AWS: - org.daisy.pipeline.tts.aws.accesskey - org.daisy.pipeline.tts.aws.secretkey - org.daisy.pipeline.tts.aws.region - Omit namespace prefixes from SSML as AWS seems to refuse it. - Cleaned up tests files

- Change name of engine from "polly" to "aws" for mapping in UI project - Prioritize standard voices in the voices collection. This is to ensure a standard voice is used for the sanity check of the engine. - Add the engine (standard, neural, generative, long form) in the voice name to help identify which engine would be use (as it can greatly impact costs for the user). - Remove IDs in SSML as AWS rejects SSML with IDs.

to align with Azure and Google adapters.

bertfrees · 2025-07-07T20:26:52Z

Merged in 6dbdb2f.

bertfrees · 2025-07-07T20:35:00Z

I was able to create an account and run the tests succesfully.

It was not clear from the README where I can find the region identifier, so I just took the same as you (eu-west-3).

One thing I noticed was that there are debug messages on stdout. This is because of the logback.xml file in src/test/resources. @NPavie Did you do this on purpose?

NPavie · 2025-07-09T07:19:07Z

Hi @bertfrees

For the region identifier, if you open the "IAM Identity center" service of your account (not the "IAM" service, those are 2 different services) and open the dashboard of the service, you should see the region identifier in the parameters summary on the right.
I noticed in my case that the region identifier is also displayed in the URL of the console as region=eu-west-3 url parameter.

For the logback report, I might have left it for debug. You probably can remove it if it is too verbose.
(Also if i recall the amazon library is itself very verbose)

bertfrees · 2025-07-12T07:53:42Z

I'll check if I can find the parameter that way, and update the README.

I was also thinking the README would benefit from a short list of clear instructions specifically for creating a new access key for Polly. TBH I didn't follow the linked instructions because I didn't feel like reading that much, and also those instructions seemed very general and broad. I didn't need to know all the possibilities, I just wanted to create an access key quickly and specifically for Polly. So I tried to find in the user interface how to "create a new IAM user" and "access key", and the UI was intuitive enough (except to find the region).

see daisy/pipeline#824

NPavie requested a review from bertfrees May 20, 2025 11:04

bertfrees added this to Pipeline 2 | June 2025 release Jun 11, 2025

bertfrees assigned NPavie Jun 12, 2025

bertfrees moved this to In Progress in Pipeline 2 | June 2025 release Jun 12, 2025

bertfrees force-pushed the tts-adapter-aws branch 2 times, most recently from a7cdcd9 to e25cdbf Compare June 17, 2025 13:33

bertfrees unassigned NPavie Jun 17, 2025

bertfrees force-pushed the tts-adapter-aws branch from e25cdbf to 5fd6727 Compare June 17, 2025 14:02

bertfrees reviewed Jun 17, 2025

View reviewed changes

NPavie added 5 commits June 17, 2025 16:46

[modules] Change default priority to 15

134b8f7

to align with Azure and Google adapters.

[modules] Ignore tests requiring AWS authentication by default

e17da9f

[modules] Add README

980dc51

bertfrees force-pushed the tts-adapter-aws branch from 5fd6727 to 980dc51 Compare June 17, 2025 14:47

bertfrees self-assigned this Jun 17, 2025

bertfrees moved this from In Progress to Done in Pipeline 2 | June 2025 release Jul 7, 2025

bertfrees added a commit to daisy/pipeline-modules that referenced this pull request Aug 12, 2025

Text-to-speech adapter for Amazon Polly

07e8720

see daisy/pipeline#824

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Text-to-speech adapter for Amazon Polly #824

Text-to-speech adapter for Amazon Polly #824

Uh oh!

NPavie commented May 20, 2025

Uh oh!

bertfrees commented Jun 17, 2025

Uh oh!

Uh oh!

Uh oh!

bertfrees Jun 17, 2025 •

edited

Loading

Uh oh!

bertfrees commented Jul 7, 2025

Uh oh!

bertfrees commented Jul 7, 2025 •

edited

Loading

Uh oh!

NPavie commented Jul 9, 2025 •

edited

Loading

Uh oh!

bertfrees commented Jul 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Text-to-speech adapter for Amazon Polly #824

Are you sure you want to change the base?

Text-to-speech adapter for Amazon Polly #824

Uh oh!

Conversation

NPavie commented May 20, 2025

Uh oh!

bertfrees commented Jun 17, 2025

Uh oh!

Uh oh!

Uh oh!

bertfrees Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bertfrees commented Jul 7, 2025

Uh oh!

bertfrees commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NPavie commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bertfrees commented Jul 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bertfrees Jun 17, 2025 •

edited

Loading

bertfrees commented Jul 7, 2025 •

edited

Loading

NPavie commented Jul 9, 2025 •

edited

Loading