[LibriSpeech] Fix dev split local_extracted_archive for 'all' config #4904
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We define the keys for the
_DL_URLSof the dev split asdev.cleananddev.other:datasets/datasets/librispeech_asr/librispeech_asr.py
Lines 60 to 61 in 2e7142a
These keys get forwarded to the
dl_managerand thus thelocal_extracted_archive.However, when calling
SplitGeneratorfor the dev sets, we query thelocal_extracted_archivekeysvalidation.cleanandvalidation.other:datasets/datasets/librispeech_asr/librispeech_asr.py
Line 212 in 2e7142a
datasets/datasets/librispeech_asr/librispeech_asr.py
Line 219 in 2e7142a
The consequence of this is that the
local_extracted_archivearg passed to_generate_examplesis alwaysNone, as the keysvalidation.cleanandvalidation.otherdo not exists in thelocal_extracted_archive.When defining the
audio_filein_generate_examples, sincelocal_extracted_archiveis alwaysNone, we always omit thelocal_extracted_archivepath from theaudio_filepath, even if in non-streaming mode:datasets/datasets/librispeech_asr/librispeech_asr.py
Lines 259 to 263 in 2e7142a
Thus,
audio_filewill only ever be the streaming path (audio_file, notos.path.join(local_extracted_archive, audio_file)).This PR fixes the
.get()keys for thelocal_extracted_archivefor the dev splits.