Skip to content

New command to check compatibility between a filelist and a model or config as well as update the text config based on the filelist #385

@roedoejet

Description

@roedoejet

there should be a way to check that a given filelist is compatible (i.e. can be used for training/inference) with a text config.

The command should check to see if the filelist has the appropriate text representation (characters/phones) and whether they are tokenized (character_tokens/phone_tokens). It should tokenize if tokens aren't already available. Then it should make a set of all the symbols in the filelist and check that they are all contained in the text config.

The update version would then add the missing symbols to the text config.

This would be useful for the situation where someone wants to see if a given filelist is compatible with their model's current configuration. The update function would invalidate the model, but would be helpful for the situation where someone:

  • runs the wizard for a language that does not have g2p
  • then later adds g2p
  • then wants to preprocess their phones but their text configuration does not contain the phone list

I imagine being able to check with either a text config or a model (where we just read the text config stored in the model).

Something like everyvoice check-filelist <filelist> <model|text-config> or everyvoice update-text-config <filelist> <text-config>. The update function should have a warning that it will invalidate any models trained with the previous config.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions