Skip to content

Possible bug in LJSpeech training data #108

@danielmsu

Description

@danielmsu

Some sentences in LJSpeech dataset start with a quote, and seems like quotes are substituted for $ in such cases.

For example, check this file: LJ005-0077.wav (vocaroo link)
Text from the original dataset: "expedient to introduce such measures and arrangements as shall not only provide for the safe custody,
Text from Data/train_list.txt: LJ005-0077.wav|dˈɑːlɚɹ ɛkspˈiːdiənt tʊ ˌɪntɹədˈuːs sˈʌtʃ mˈɛʒɚz ænd ɚɹˈeɪndʒmənts æz ʃˌæl nˌɑːt ˈoʊnli pɹəvˈaɪd fɚðə sˈeɪf kˈʌstədi ,|0

There is dˈɑːlɚɹ in the beginning, but audio doesn't actually have this word. Same happens for other files that start with " (approx. 83 samples).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions