-
Notifications
You must be signed in to change notification settings - Fork 677
[Model] Add Silero VAD example #1107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
47d34b8
add vad example
chenqianhe 180a088
Merge branch 'develop' of https://github.com/chenqianhe/FastDeploy in…
chenqianhe b9d6b25
fix typo
chenqianhe cf482b6
Merge branch 'develop' into develop
chenqianhe 82f4268
Merge branch 'develop' into develop
DefTruth af0d4f1
fix typo
chenqianhe 356bbf4
rename file
chenqianhe 8071ed3
Merge branch 'PaddlePaddle:develop' into develop
chenqianhe 62a9c1b
remove model and wav
chenqianhe 35bde94
delete Vad.cc
chenqianhe ae0ed89
delete Vad.h
chenqianhe 92dfd56
rename and format
chenqianhe 7d733a1
fix max and min
chenqianhe fdee845
update readme
chenqianhe 230caa3
Merge branch 'develop' into develop
DefTruth 3201b1f
rename var
chenqianhe fb2d028
Merge branch 'develop' of https://github.com/chenqianhe/FastDeploy in…
chenqianhe 5d9e145
Merge branch 'develop' into develop
DefTruth 326ed7e
format
chenqianhe 253aa3b
Merge branch 'develop' of https://github.com/chenqianhe/FastDeploy in…
chenqianhe ca99e77
add params
chenqianhe 60cfb3e
update readme
chenqianhe 8cfdbf0
update readme
chenqianhe 23f6391
Merge branch 'develop' into develop
chenqianhe a4883a3
Merge branch 'develop' into develop
chenqianhe d52d66c
Merge branch 'develop' into develop
chenqianhe 592905d
Update README.md
chenqianhe 3d123f3
Update README_CN.md
chenqianhe 64f5a86
Merge branch 'develop' into develop
DefTruth File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| English | [简体中文](README_CN.md) | ||
|
|
||
| # Silero VAD - pre-trained enterprise-grade Voice Activity Detector | ||
|
|
||
| The deployment model comes from [silero-vad](https://github.com/snakers4/silero-vad) | ||
|
|
||
|  | ||
|
|
||
| ## Key Features | ||
|
|
||
| * Stellar accuracy | ||
|
|
||
| Silero VAD has excellent results on speech detection tasks. | ||
|
|
||
| * Fast | ||
|
|
||
| One audio chunk (30+ ms) takes less than 1ms to be processed on a single CPU thread. Using batching or GPU can also improve performance considerably. | ||
|
|
||
| * General | ||
|
|
||
| Silero VAD was trained on huge corpora that include over 100 languages and it performs well on audios from different domains with various background noise and quality levels. | ||
|
|
||
| * Flexible sampling rate | ||
|
|
||
| Silero VAD supports 8000 Hz and 16000 Hz sampling rates. | ||
|
|
||
| ## Download Pre-trained ONNX Model | ||
|
|
||
| For developers' testing, model exported by VAD are provided below. Developers can download them directly. | ||
|
|
||
| | 模型 | 大小 | 备注 | | ||
| | :----------------------------------------------------------- | :---- | :----------------------------------------------------------- | | ||
| | [silero-vad](https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad.tgz) | 1.8MB | This model file is sourced from [snakers4/silero-vad](https://github.com/snakers4/silero-vad),MIT License | | ||
|
|
||
| ## Detailed Deployment Documents | ||
|
|
||
| - [C++ deployment](cpp) | ||
|
|
||
| ## Source | ||
|
|
||
| [https://github.com/snakers4/silero-vad](https://github.com/snakers4/silero-vad) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| 简体中文 | [English](README.md) | ||
|
|
||
| # Silero VAD 预训练的企业级语音活动检测器 | ||
|
|
||
| 该部署模型来自于 [silero-vad](https://github.com/snakers4/silero-vad) | ||
|
|
||
|  | ||
|
|
||
| ## 主要特征 | ||
|
|
||
| * 高准确率 | ||
|
|
||
| Silero VAD在语音检测任务上有着优异的成绩。 | ||
|
|
||
| * 快速推理 | ||
|
|
||
| 一个音频块(30+ 毫秒)在单个 CPU 线程上处理时间不到 1毫秒。 | ||
|
|
||
| * 通用性 | ||
|
|
||
| Silero VAD 在包含100多种语言的庞大语料库上进行了训练,它在来自不同领域、具有不同背景噪音和质量水平的音频上表现良好。 | ||
|
|
||
| * 灵活采样率 | ||
|
|
||
| Silero VAD支持 8000 Hz和16000 Hz 采样率。 | ||
|
|
||
| ## 下载预训练ONNX模型 | ||
|
|
||
| 为了方便开发者的测试,下面提供了 VAD 导出模型,开发者可直接下载使用。 | ||
| | 模型 | 大小 | 备注 | | ||
| | :----------------------------------------------------------- | :---- | :----------------------------------------------------------- | | ||
| | [silero-vad](https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad.tgz) | 1.8MB | 此模型文件来源于[snakers4/silero-vad](https://github.com/snakers4/silero-vad),MIT License | | ||
|
|
||
| ## 详细部署文档 | ||
|
|
||
| - [C++ 部署](cpp) | ||
|
|
||
| ## 模型来源 | ||
|
|
||
| [https://github.com/snakers4/silero-vad](https://github.com/snakers4/silero-vad) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| cmake_minimum_required(VERSION 3.23) | ||
| project(silero_vad) | ||
|
|
||
| set(CMAKE_CXX_STANDARD 11) | ||
|
|
||
| # 指定下载解压后的fastdeploy库路径 | ||
| option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.") | ||
|
|
||
| include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake) | ||
|
|
||
| # 添加FastDeploy依赖头文件 | ||
| include_directories(${FASTDEPLOY_INCS}) | ||
|
|
||
| add_executable(infer_onnx_silero_vad ${PROJECT_SOURCE_DIR}/infer_onnx_silero_vad.cc wav.h vad.cc vad.h) | ||
|
|
||
| # 添加FastDeploy库依赖 | ||
| target_link_libraries(infer_onnx_silero_vad ${FASTDEPLOY_LIBS}) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| English | [简体中文](README_CN.md) | ||
|
|
||
| # Silero VAD Deployment Example | ||
|
|
||
| This directory provides examples that `infer_onnx_silero_vad` fast finishes the deployment of VAD models on CPU/GPU. | ||
|
|
||
| Before deployment, two steps require confirmation. | ||
|
|
||
| - 1. Software and hardware should meet the requirements. Please refer to [FastDeploy Environment Requirements](../../../../docs/en/build_and_install/download_prebuilt_libraries.md). | ||
| - 2. Download the precompiled deployment library and samples code according to your development environment. Refer to [FastDeploy Precompiled Library](../../../../docs/en/build_and_install/download_prebuilt_libraries.md). | ||
|
|
||
| Taking VAD inference on Linux as an example, the compilation test can be completed by executing the following command in this directory. | ||
|
|
||
| ```bash | ||
| mkdir build | ||
| cd build | ||
| # Download the FastDeploy precompiled library. Users can choose your appropriate version in the `FastDeploy Precompiled Library` mentioned above | ||
| wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-x.x.x.tgz | ||
| tar xvf fastdeploy-linux-x64-x.x.x.tgz | ||
| cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-x.x.x | ||
| make -j | ||
|
|
||
| # Download the VAD model file and test audio. After decompression, place the model and test audio in the infer_onnx_silero_vad.cc peer directory | ||
| wget https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad.tgz | ||
| wget https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad_sample.wav | ||
|
|
||
| # inference | ||
| ./infer_onnx_silero_vad ../silero_vad.onnx ../silero_vad_sample.wav | ||
chenqianhe marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| - The above command works for Linux or MacOS. Refer to: | ||
| - [How to use FastDeploy C++ SDK in Windows](../../../../docs/en/faq/use_sdk_on_windows.md) for SDK use-pattern in Windows | ||
|
|
||
| ## VAD C++ Interface | ||
|
|
||
| ### Vad Class | ||
|
|
||
| ```c++ | ||
| Vad::Vad(const std::string& model_file, | ||
| const fastdeploy::RuntimeOption& custom_option = fastdeploy::RuntimeOption()) | ||
| ``` | ||
|
|
||
| **Parameter** | ||
|
|
||
| > * **model_file**(str): Model file path | ||
| > * **runtime_option**(RuntimeOption): Backend inference configuration. None by default. (use the default configuration) | ||
|
|
||
| ### setAudioCofig function | ||
|
|
||
| **Must be called before the `init` function** | ||
|
|
||
| ```c++ | ||
| void Vad::setAudioCofig(int sr, int frame_ms, float threshold, int min_silence_duration_ms, int speech_pad_ms); | ||
| ``` | ||
|
|
||
| **Parameter** | ||
|
|
||
| > * **sr**(int): sampling rate | ||
| > * **frame_ms**(int): The length of each detection frame, and it is used to calculate the detection window size | ||
| > * **threshold**(float): Result probability judgment threshold | ||
| > * **min_silence_duration_ms**(int): The threshold used to calculate whether it is silence | ||
| > * **speech_pad_ms**(int): Used to calculate the end time of the speech | ||
|
|
||
| ### init function | ||
|
|
||
| Used to initialize audio-related parameters. | ||
|
|
||
| ```c++ | ||
| void Vad::init(); | ||
| ``` | ||
|
|
||
| ### loadAudio function | ||
|
|
||
| Load audio. | ||
|
|
||
| ```c++ | ||
| void Vad::loadAudio(const std::string& wavPath) | ||
| ``` | ||
|
|
||
| **Parameter** | ||
|
|
||
| > * **wavPath**(str): Audio file path | ||
|
|
||
| ### Predict function | ||
|
|
||
| Used to start model reasoning. | ||
|
|
||
| ```c++ | ||
| bool Vad::Predict(); | ||
| ``` | ||
|
|
||
| ### getResult function | ||
|
|
||
| **Used to obtain reasoning results** | ||
|
|
||
| ```c++ | ||
| std::vector<std::map<std::string, float>> Vad::getResult( | ||
| float removeThreshold = 1.6, float expandHeadThreshold = 0.32, float expandTailThreshold = 0, | ||
| float mergeThreshold = 0.3); | ||
| ``` | ||
|
|
||
| **Parameter** | ||
|
|
||
| > * **removeThreshold**(float): Discard result fragment threshold; If some recognition results are too short, they will be discarded according to this threshold | ||
| > * **expandHeadThreshold**(float): Offset at the beginning of the segment; The recognized start time may be too close to the voice part, so move forward the start time accordingly | ||
| > * **expandTailThreshold**(float): Offset at the end of the segment; The recognized end time may be too close to the voice part, so the end time is moved back accordingly | ||
| > * **mergeThreshold**(float): Some result segments are very close and can be combined into one, and the vocal segments can be combined accordingly | ||
|
|
||
| **The output result format is**`std::vector<std::map<std::string, float>>` | ||
|
|
||
| > Output a list, each element is a speech fragment | ||
| > | ||
| > Each clip can use 'start' to get the start time and 'end' to get the end time | ||
|
|
||
| ### Tips | ||
|
|
||
| 1. `The setAudioCofig`function must be called before the `init` function | ||
| 2. The sampling rate of the input audio file must be consistent with that set in the code | ||
|
|
||
| - [Model Description](../) | ||
| - [How to switch the model inference backend engine](../../../../docs/en/faq/how_to_change_backend.md) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,119 @@ | ||
| [English](README.md) | 简体中文 | ||
| # Silero VAD 部署示例 | ||
|
|
||
| 本目录下提供`infer_onnx_silero_vad`快速完成 Silero VAD 模型在CPU/GPU。 | ||
|
|
||
| 在部署前,需确认以下两个步骤 | ||
|
|
||
| - 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../docs/cn/build_and_install/download_prebuilt_libraries.md) | ||
| - 2. 根据开发环境,下载预编译部署库和samples代码,参考[FastDeploy预编译库](../../../../docs/cn/build_and_install/download_prebuilt_libraries.md) | ||
|
|
||
| 以Linux上 VAD 推理为例,在本目录执行如下命令即可完成编译测试。 | ||
|
|
||
| ```bash | ||
| mkdir build | ||
| cd build | ||
| # 下载FastDeploy预编译库,用户可在上文提到的`FastDeploy预编译库`中自行选择合适的版本使用 | ||
| wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-x.x.x.tgz | ||
| tar xvf fastdeploy-linux-x64-x.x.x.tgz | ||
| cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-x.x.x | ||
| make -j | ||
|
|
||
| # 下载 VAD 模型文件和测试音频,解压后将模型和测试音频放置在与 infer_onnx_silero_vad.cc 同级目录下 | ||
chenqianhe marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| wget https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad.tgz | ||
| wget https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad_sample.wav | ||
|
|
||
| # 推理 | ||
chenqianhe marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ./infer_onnx_silero_vad ../silero_vad.onnx ../silero_vad_sample.wav | ||
| ``` | ||
|
|
||
| 以上命令只适用于Linux或MacOS, Windows下SDK的使用方式请参考: | ||
| - [如何在Windows中使用FastDeploy C++ SDK](../../../../docs/cn/faq/use_sdk_on_windows.md) | ||
|
|
||
| ## VAD C++ 接口 | ||
| ### Vad 类 | ||
|
|
||
| ```c++ | ||
| Vad::Vad(const std::string& model_file, | ||
| const fastdeploy::RuntimeOption& custom_option = fastdeploy::RuntimeOption()) | ||
| ``` | ||
|
|
||
| **参数** | ||
|
|
||
| > * **model_file**(str): 模型文件路径 | ||
| > * **runtime_option**(RuntimeOption): 后端推理配置,默认为None,即采用默认配置 | ||
|
|
||
| ### setAudioCofig 函数 | ||
|
|
||
| **必须在`init`函数前调用** | ||
|
|
||
| ```c++ | ||
| void Vad::setAudioCofig(int sr, int frame_ms, float threshold, int min_silence_duration_ms, int speech_pad_ms); | ||
| ``` | ||
|
|
||
| **参数** | ||
|
|
||
| > * **sr**(int): 采样率 | ||
| > * **frame_ms**(int): 每次检测帧长,用于计算检测窗口大小 | ||
| > * **threshold**(float): 结果概率判断阈值 | ||
| > * **min_silence_duration_ms**(int): 用于计算判断是否是 silence 的阈值 | ||
| > * **speech_pad_ms**(int): 用于计算 speach 结束时刻 | ||
|
|
||
| ### init 函数 | ||
|
|
||
| 用于初始化音频相关参数 | ||
|
|
||
| ```c++ | ||
| void Vad::init(); | ||
| ``` | ||
|
|
||
| ### loadAudio 函数 | ||
|
|
||
| 加载音频 | ||
|
|
||
| ```c++ | ||
| void Vad::loadAudio(const std::string& wavPath) | ||
| ``` | ||
|
|
||
| **参数** | ||
|
|
||
| > * **wavPath**(str): 音频文件路径 | ||
|
|
||
| ### Predict 函数 | ||
|
|
||
| 用于开始模型推理 | ||
|
|
||
| ```c++ | ||
| bool Vad::Predict(); | ||
| ``` | ||
|
|
||
| ### getResult 函数 | ||
|
|
||
| **用于获取推理结果** | ||
|
|
||
| ```c++ | ||
| std::vector<std::map<std::string, float>> Vad::getResult( | ||
| float removeThreshold = 1.6, float expandHeadThreshold = 0.32, float expandTailThreshold = 0, | ||
| float mergeThreshold = 0.3); | ||
| ``` | ||
|
|
||
| **参数** | ||
|
|
||
| > * **removeThreshold**(float): 丢弃结果片段阈值;部分识别结果太短则根据此阈值丢弃 | ||
| > * **expandHeadThreshold**(float): 结果片段开始时刻偏移;识别到的开始时刻可能过于贴近发声部分,因此据此前移开始时刻 | ||
| > * **expandTailThreshold**(float): 结果片段结束时刻偏移;识别到的结束时刻可能过于贴近发声部分,因此据此后移结束时刻 | ||
| > * **mergeThreshold**(float): 有的结果片段十分靠近,可以合并成一个,据此合并发声片段 | ||
|
|
||
| **输出结果格式为**`std::vector<std::map<std::string, float>>` | ||
|
|
||
| > 输出一个列表,每个元素是一个讲话片段 | ||
| > | ||
| > 每个片段可以用 'start' 获取到开始时刻,用 'end' 获取到结束时刻 | ||
|
|
||
| ### 提示 | ||
|
|
||
| 1. `setAudioCofig`函数必须在`init`函数前调用 | ||
| 2. 输入的音频文件的采样率必须与代码中设置的保持一致 | ||
|
|
||
| - [模型介绍](../) | ||
| - [如何切换模型推理后端引擎](../../../../docs/cn/faq/how_to_change_backend.md) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| #include <iostream> | ||
|
|
||
| #include "vad.h" | ||
|
|
||
| int main(int argc, char* argv[]) { | ||
| if (argc < 3) { | ||
| std::cout << "Usage: infer_onnx_silero_vad path/to/model path/to/audio " | ||
| "run_option, " | ||
| "e.g ./infer_onnx_silero_vad silero_vad.onnx sample.wav" | ||
| << std::endl; | ||
| return -1; | ||
| } | ||
|
|
||
| std::string model_file = argv[1]; | ||
| std::string audio_file = argv[2]; | ||
|
|
||
| Vad vad(model_file); | ||
| // custom config, but must be set before init | ||
| // vad.setAudioCofig(16000, 64, 0.5f, 0, 0); | ||
| vad.init(); | ||
| vad.loadAudio(audio_file); | ||
| vad.Predict(); | ||
| std::vector<std::map<std::string, float>> result = vad.getResult(); | ||
| for (auto& res : result) { | ||
| std::cout << "speak start: " << res["start"] << " s, end: " << res["end"] | ||
| << " s" << std::endl; | ||
| } | ||
| return 0; | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.