-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Refactor the whole data preprocessor part for DeepSpeech2. #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the whole data preprocessor part for DeepSpeech2. #91
Conversation
…ize dir, add augmentaion interfaces etc.). 1. Refactor data preprocessor with new added class AudioSegment, SpeechSegment, TextFeaturizer, AudioFeaturizer, SpeechFeaturizer. 2. Add data augmentation interfaces and class AugmentorBase, AugmentationPipeline, VolumnPerturbAugmentor etc.. 3. Seperate normalizer's mean and std computing from training, by adding FeatureNormalizer and a seperate tool compute_mean_std.py. 4. Re-organize directory.
qingqing01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后续觉得可以加数据处理的doc,这个过程还是挺复杂的~
| "Otherwise, the training will resume from " | ||
| "the existing model of this path. (default: %(default)s)") | ||
| parser.add_argument( | ||
| "--augmentation_config", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
真实运行的时候需要提供augmentation_config配置吗?只看到code里注释的json格式,没看到json文件,如果运行的时候需要,可否提供一个json文件,用户用时配置就可以
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个建议很好,当前augmentation_config为str格式(由于目前augmentation仅留置了接口,所以默认augmentation_config='{}',即augmentation不生效),配置json string确实不方便。
因为模型参数较多,后续可以统一提供一个config file。
qingqing01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
chrisxu2016
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| :rtype: AudioSegment | ||
| """ | ||
| samples, sample_rate = soundfile.read(file, dtype='float32') | ||
| return cls(samples, sample_rate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
默认只读取.wav文件吗?
| :param gain: Gain in decibels to apply to samples. | ||
| :type gain: float | ||
| """ | ||
| self._samples *= 10.**(gain / 20.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议这里返回一个新建一个audio对象,方便后面添加add_noise时,复用这个方法
return type(self)(10.**(gain / 20.) * self._samples, self._sample_rate)
| :return: Number of samples. | ||
| :rtype: int | ||
| """ | ||
| return self._samples.shape(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该是 self._samples.shape[0], ()改为[]
resolve #90
AudioSegment,SpeechSegment,TextFeaturizer,AudioFeaturizer,SpeechFeaturizeretc.AugmentorBase,AugmentationPipeline,VolumePerturbAugmentoretc., to make it easier to add more data augmentation models.DataGenerator. AddFeatureNormalizer. -compute_mean_std.pyfor users to create mean_std file before training.datadirectory intodatasetsanddata_utils.