Skip to content

Conversation

@NPavie
Copy link
Collaborator

@NPavie NPavie commented May 20, 2025

This PR integrates the text-to-speech adapter for Amazon Polly engines into the DAISY Pipeline 2 modules tree.

Original code was authored by @mmartida and provided by ONCE to the DAISY Pipeline team.
The adapter was then updated for the latest version of Daisy Pipeline 2 and refactored to match similar adapters structure (mainly azure and google TTS adapters).

The adapter has also been updated to allow the use of generative, long-form, or neural voices if available for the user's AWS account.

The following properties are used by the adapter :

  • org.daisy.pipeline.tts.aws.accesskey (mandatory) : Access key to connect to Amazon Web Service.
  • org.daisy.pipeline.tts.aws.secretkey (mandatory) : Secret key to connect to Amazon Web Service.
  • org.daisy.pipeline.tts.aws.region (mandatory) : Region associated with your Amazon Web Service account.
  • org.daisy.pipeline.tts.aws.priority (defaults to 15) : priority of usage within the pipeline if other text engine are available

The adapter relies on AWS java Polly client library. All dependencies required for its usage are added to the modules/bom project.

In case it would be preferable to release the adapter on maven before including it's first release version, this PR :

  • Does not register the adapter itself in the modules/bom
  • Does not include the aforementioned dependencies and the new adapter in the current assembly project

(@bertfrees I'll leave you the squashing and rephrase for the pipeline history depending on how you want it merged in)

@bertfrees
Copy link
Member

@NPavie I'm finally having a look at your PR. I have rebased the branch onto master and did some minor tweaks to the code formatting, and commit messages. I also removed some commented out code. These all seemed to stem from the original code (which is still available on the polly branch).

Incidentally, I have also rebased the original branch to polly-rebased, and am checking whether there could be some useful stuff that isn't in this PR.

First thing I notice (apart from all the changes in that branch not related to the AWS adapter) is that there were a lot of tests. I haven't looked at them yet. Were they not useful you think?

I will have some more remarks on specific pieces of code.

awsVoicesDescription.get(engine).voices()
.stream()
.map(i -> new Voice("aws", i.idAsString() + " (" + engine.toString() + ")",
new Locale(i.languageName()),
Copy link
Member

@bertfrees bertfrees Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places locales are parsed like this:

(new Locale.Builder()).setLanguageTag(info.getLocale().replace("_", "-")).build()

In the original code there was some language normalisation going on.

  • arb was replaced by ar
  • cmn-CN was replaced by zh_CN

(Originally this transformation was done in VoiceInfo.tagToLocale)(, not in TTSEngine.getAvailableVoices(), but the result should be the same.)

I wonder whether this normalisation is still needed. Note that we have the org.daisy.pipeline.common.NormalizeLang class (in common-utils) which could in theory be used for it.

NPavie added 5 commits June 17, 2025 16:46
The original code was provided by ONCE. The adapter was then updated
for the latest version of Pipeline and refactored to match the
structure of other adapters (notably the Azure and Google ones). The
adapter was also updated to allow for the use of generative,
long-form, or neural voices if available for the user's AWS account.

The main changes compared to original code from ONCE are:

- Updated the service and engine classes to mimic Azure adapter.
- Added the following configuration properties for connecting to AWS:
  - org.daisy.pipeline.tts.aws.accesskey
  - org.daisy.pipeline.tts.aws.secretkey
  - org.daisy.pipeline.tts.aws.region
- Omit namespace prefixes from SSML as AWS seems to refuse it.
- Cleaned up tests files
- Change name of engine from "polly" to "aws" for mapping in UI
  project
- Prioritize standard voices in the voices collection. This is to
  ensure a standard voice is used for the sanity check of the engine.
- Add the engine (standard, neural, generative, long form) in the
  voice name to help identify which engine would be use (as it can
  greatly impact costs for the user).
- Remove IDs in SSML as AWS rejects SSML with IDs.
to align with Azure and Google adapters.
@bertfrees bertfrees self-assigned this Jun 17, 2025
@bertfrees
Copy link
Member

Merged in 6dbdb2f.

@bertfrees bertfrees moved this from In Progress to Done in Pipeline 2 | June 2025 release Jul 7, 2025
@bertfrees
Copy link
Member

bertfrees commented Jul 7, 2025

I was able to create an account and run the tests succesfully.

It was not clear from the README where I can find the region identifier, so I just took the same as you (eu-west-3).

One thing I noticed was that there are debug messages on stdout. This is because of the logback.xml file in src/test/resources. @NPavie Did you do this on purpose?

@NPavie
Copy link
Collaborator Author

NPavie commented Jul 9, 2025

Hi @bertfrees

For the region identifier, if you open the "IAM Identity center" service of your account (not the "IAM" service, those are 2 different services) and open the dashboard of the service, you should see the region identifier in the parameters summary on the right.
I noticed in my case that the region identifier is also displayed in the URL of the console as region=eu-west-3 url parameter.

For the logback report, I might have left it for debug. You probably can remove it if it is too verbose.
(Also if i recall the amazon library is itself very verbose)

@bertfrees
Copy link
Member

I'll check if I can find the parameter that way, and update the README.

I was also thinking the README would benefit from a short list of clear instructions specifically for creating a new access key for Polly. TBH I didn't follow the linked instructions because I didn't feel like reading that much, and also those instructions seemed very general and broad. I didn't need to know all the possibilities, I just wanted to create an access key quickly and specifically for Polly. So I tried to find in the user interface how to "create a new IAM user" and "access key", and the UI was intuitive enough (except to find the region).

bertfrees added a commit to daisy/pipeline-modules that referenced this pull request Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

3 participants