there should be a way to check that a given filelist is compatible (i.e. can be used for training/inference) with a text config.
The command should check to see if the filelist has the appropriate text representation (characters/phones) and whether they are tokenized (character_tokens/phone_tokens). It should tokenize if tokens aren't already available. Then it should make a set of all the symbols in the filelist and check that they are all contained in the text config.
The update version would then add the missing symbols to the text config.
This would be useful for the situation where someone wants to see if a given filelist is compatible with their model's current configuration. The update function would invalidate the model, but would be helpful for the situation where someone:
- runs the wizard for a language that does not have g2p
- then later adds g2p
- then wants to preprocess their phones but their text configuration does not contain the phone list
I imagine being able to check with either a text config or a model (where we just read the text config stored in the model).
Something like everyvoice check-filelist <filelist> <model|text-config> or everyvoice update-text-config <filelist> <text-config>. The update function should have a warning that it will invalidate any models trained with the previous config.
there should be a way to check that a given filelist is compatible (i.e. can be used for training/inference) with a text config.
The command should check to see if the filelist has the appropriate text representation (characters/phones) and whether they are tokenized (character_tokens/phone_tokens). It should tokenize if tokens aren't already available. Then it should make a set of all the symbols in the filelist and check that they are all contained in the text config.
The
updateversion would then add the missing symbols to the text config.This would be useful for the situation where someone wants to see if a given filelist is compatible with their model's current configuration. The
updatefunction would invalidate the model, but would be helpful for the situation where someone:I imagine being able to check with either a text config or a model (where we just read the text config stored in the model).
Something like
everyvoice check-filelist <filelist> <model|text-config>oreveryvoice update-text-config <filelist> <text-config>. The update function should have a warning that it will invalidate any models trained with the previous config.