PaddlePaddle · jiangjiajun · Jan 15, 2023 · Jan 10, 2023 · Jan 10, 2023 · Jan 10, 2023
diff --git a/examples/audio/silero-vad/README.md b/examples/audio/silero-vad/README.md
@@ -0,0 +1,41 @@
+English | [简体中文](README_CN.md)
+
+# Silero VAD - pre-trained enterprise-grade Voice Activity Detector
+
+The deployment model comes from [silero-vad](https://github.com/snakers4/silero-vad)
+
+![](https://user-images.githubusercontent.com/36505480/198026365-8da383e0-5398-4a12-b7f8-22c2c0059512.png)
+
+## Key Features
+
+* Stellar accuracy
+
+Silero VAD has excellent results on speech detection tasks.
+
+* Fast
+
+One audio chunk (30+ ms) takes less than 1ms to be processed on a single CPU thread. Using batching or GPU can also improve performance considerably.
+
+* General
+
+Silero VAD was trained on huge corpora that include over 100 languages and it performs well on audios from different domains with various background noise and quality levels.
+
+* Flexible sampling rate
+
+Silero VAD supports 8000 Hz and 16000 Hz sampling rates.
+
+## Download Pre-trained ONNX Model
+
+For developers' testing, model exported by VAD are provided below. Developers can download them directly.
+
+| 模型                                                         | 大小  | 备注                                                         |
+| :----------------------------------------------------------- | :---- | :----------------------------------------------------------- |
+| [silero-vad](https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad.tgz) | 1.8MB | This model file is sourced from [snakers4/silero-vad](https://github.com/snakers4/silero-vad)，MIT License |
+
+## Detailed Deployment Documents
+
+- [C++ deployment](cpp)
+
+## Source
+
+[https://github.com/snakers4/silero-vad](https://github.com/snakers4/silero-vad)
diff --git a/examples/audio/silero-vad/README_CN.md b/examples/audio/silero-vad/README_CN.md
@@ -0,0 +1,40 @@
+简体中文 ｜ [English](README.md)
+
+# Silero VAD 预训练的企业级语音活动检测器
+
+该部署模型来自于 [silero-vad](https://github.com/snakers4/silero-vad)
+
+![](https://user-images.githubusercontent.com/36505480/198026365-8da383e0-5398-4a12-b7f8-22c2c0059512.png)
+
+## 主要特征
+
+* 高准确率
+
+Silero VAD在语音检测任务上有着优异的成绩。
+
+* 快速推理
+
+一个音频块（30+ 毫秒）在单个 CPU 线程上处理时间不到 1毫秒。
+
+* 通用性
+
+Silero VAD 在包含100多种语言的庞大语料库上进行了训练，它在来自不同领域、具有不同背景噪音和质量水平的音频上表现良好。
+
+* 灵活采样率
+
+Silero VAD支持 8000 Hz和16000 Hz 采样率。
+
+## 下载预训练ONNX模型
+
+为了方便开发者的测试，下面提供了 VAD 导出模型，开发者可直接下载使用。
+| 模型                                                         | 大小  | 备注                                                         |
+| :----------------------------------------------------------- | :---- | :----------------------------------------------------------- |
+| [silero-vad](https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad.tgz) | 1.8MB | 此模型文件来源于[snakers4/silero-vad](https://github.com/snakers4/silero-vad)，MIT License |
+
+## 详细部署文档
+
+- [C++ 部署](cpp)
+
+## 模型来源
+
+[https://github.com/snakers4/silero-vad](https://github.com/snakers4/silero-vad)
diff --git a/examples/audio/silero-vad/cpp/CMakeLists.txt b/examples/audio/silero-vad/cpp/CMakeLists.txt
@@ -0,0 +1,17 @@
+cmake_minimum_required(VERSION 3.23)
+project(silero_vad)
+
+set(CMAKE_CXX_STANDARD 11)
+
+# 指定下载解压后的fastdeploy库路径
+option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")
+
+include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)
+
+# 添加FastDeploy依赖头文件
+include_directories(${FASTDEPLOY_INCS})
+
+add_executable(infer_onnx_silero_vad ${PROJECT_SOURCE_DIR}/infer_onnx_silero_vad.cc wav.h vad.cc vad.h)
+
+# 添加FastDeploy库依赖
+target_link_libraries(infer_onnx_silero_vad ${FASTDEPLOY_LIBS})
diff --git a/examples/audio/silero-vad/cpp/README.md b/examples/audio/silero-vad/cpp/README.md
@@ -0,0 +1,121 @@
+English | [简体中文](README_CN.md)
+
+# Silero VAD Deployment Example
+
+This directory provides examples that `infer_onnx_silero_vad` fast finishes the deployment of VAD models on CPU/GPU.
+
+Before deployment, two steps require confirmation.
+
+- 1. Software and hardware should meet the requirements. Please refer to [FastDeploy Environment Requirements](../../../../docs/en/build_and_install/download_prebuilt_libraries.md).  
+- 2. Download the precompiled deployment library and samples code according to your development environment. Refer to [FastDeploy Precompiled Library](../../../../docs/en/build_and_install/download_prebuilt_libraries.md).
+
+Taking VAD inference on Linux as an example, the compilation test can be completed by executing the following command in this directory.
+
+```bash
+mkdir build
+cd build
+# Download the FastDeploy precompiled library. Users can choose your appropriate version in the `FastDeploy Precompiled Library` mentioned above
+wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-x.x.x.tgz
+tar xvf fastdeploy-linux-x64-x.x.x.tgz
+cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-x.x.x
+make -j
+
+# Download the VAD model file and test audio. After decompression, place the model and test audio in the infer_onnx_silero_vad.cc peer directory
+wget https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad.tgz
+wget https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad_sample.wav
+
+# inference
+./infer_onnx_silero_vad ../silero_vad.onnx ../silero_vad_sample.wav
+```
+
+- The above command works for Linux or MacOS. Refer to:
+  - [How to use FastDeploy C++ SDK in Windows](../../../../docs/en/faq/use_sdk_on_windows.md)  for SDK use-pattern in Windows
+
+## VAD C++ Interface
+
+### Vad Class
+
+```c++
+Vad::Vad(const std::string& model_file,
+    const fastdeploy::RuntimeOption& custom_option = fastdeploy::RuntimeOption())
+```
+
+**Parameter**
+
+> * **model_file**(str): Model file path
+> * **runtime_option**(RuntimeOption): Backend inference configuration. None by default. (use the default configuration)
+
+### setAudioCofig function
+
+**Must be called before the `init` function**
+
+```c++
+void Vad::setAudioCofig(int sr, int frame_ms, float threshold, int min_silence_duration_ms, int speech_pad_ms);
+```
+
+**Parameter**
+
+> * **sr**(int): sampling rate
+> * **frame_ms**(int): The length of each detection frame, and it is used to calculate the detection window size
+> * **threshold**(float): Result probability judgment threshold
+> * **min_silence_duration_ms**(int): The threshold used to calculate whether it is silence
+> * **speech_pad_ms**(int): Used to calculate the end time of the speech
+
+### init function
+
+Used to initialize audio-related parameters.
+
+```c++
+void Vad::init();
+```
+
+### loadAudio function
+
+Load audio.
+
+```c++
+void Vad::loadAudio(const std::string& wavPath)
+```
+
+**Parameter**
+
+> * **wavPath**(str): Audio file path
+
+### Predict function
+
+Used to start model reasoning.
+
+```c++
+bool Vad::Predict();
+```
+
+### getResult function
+
+**Used to obtain reasoning results**
+
+```c++
+std::vector<std::map<std::string, float>> Vad::getResult(
+            float removeThreshold = 1.6, float expandHeadThreshold = 0.32, float expandTailThreshold = 0,
+            float mergeThreshold = 0.3);
+```
+
+**Parameter**
+
+> * **removeThreshold**(float): Discard result fragment threshold; If some recognition results are too short, they will be discarded according to this threshold
+> * **expandHeadThreshold**(float): Offset at the beginning of the segment; The recognized start time may be too close to the voice part, so move forward the start time accordingly
+> * **expandTailThreshold**(float): Offset at the end of the segment; The recognized end time may be too close to the voice part, so the end time is moved back accordingly
+> * **mergeThreshold**(float): Some result segments are very close and can be combined into one, and the vocal segments can be combined accordingly
+
+**The output result format is**`std::vector<std::map<std::string, float>>`
+
+> Output a list, each element is a speech fragment
+>
+> Each clip can use 'start' to get the start time and 'end' to get the end time
+
+### Tips
+
+1. `The setAudioCofig`function must be called before the `init` function
+2. The sampling rate of the input audio file must be consistent with that set in the code
+
+- [Model Description](../)
+- [How to switch the model inference backend engine](../../../../docs/en/faq/how_to_change_backend.md)
diff --git a/examples/audio/silero-vad/cpp/README_CN.md b/examples/audio/silero-vad/cpp/README_CN.md
@@ -0,0 +1,119 @@
+[English](README.md) | 简体中文
+# Silero VAD 部署示例
+
+本目录下提供`infer_onnx_silero_vad`快速完成 Silero VAD 模型在CPU/GPU。
+
+在部署前，需确认以下两个步骤
+
+- 1. 软硬件环境满足要求，参考[FastDeploy环境要求](../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
+- 2. 根据开发环境，下载预编译部署库和samples代码，参考[FastDeploy预编译库](../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
+
+以Linux上 VAD 推理为例，在本目录执行如下命令即可完成编译测试。
+
+```bash
+mkdir build
+cd build
+# 下载FastDeploy预编译库，用户可在上文提到的`FastDeploy预编译库`中自行选择合适的版本使用
+wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-x.x.x.tgz
+tar xvf fastdeploy-linux-x64-x.x.x.tgz
+cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-x.x.x
+make -j
+
+# 下载 VAD 模型文件和测试音频，解压后将模型和测试音频放置在与 infer_onnx_silero_vad.cc 同级目录下
+wget https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad.tgz
+wget https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad_sample.wav
+
+# 推理
+./infer_onnx_silero_vad ../silero_vad.onnx ../silero_vad_sample.wav
+```
+
+以上命令只适用于Linux或MacOS, Windows下SDK的使用方式请参考:
+- [如何在Windows中使用FastDeploy C++ SDK](../../../../docs/cn/faq/use_sdk_on_windows.md)
+
+## VAD C++ 接口
+### Vad 类
+
+```c++
+Vad::Vad(const std::string& model_file,
+    const fastdeploy::RuntimeOption& custom_option = fastdeploy::RuntimeOption())
+```
+
+**参数**
+
+> * **model_file**(str): 模型文件路径
+> * **runtime_option**(RuntimeOption): 后端推理配置，默认为None，即采用默认配置
+
+### setAudioCofig 函数
+
+**必须在`init`函数前调用**
+
+```c++
+void Vad::setAudioCofig(int sr, int frame_ms, float threshold, int min_silence_duration_ms, int speech_pad_ms);
+```
+
+**参数**
+
+> * **sr**(int): 采样率
+> * **frame_ms**(int): 每次检测帧长，用于计算检测窗口大小
+> * **threshold**(float): 结果概率判断阈值
+> * **min_silence_duration_ms**(int): 用于计算判断是否是 silence 的阈值
+> * **speech_pad_ms**(int): 用于计算 speach 结束时刻
+
+### init 函数
+
+用于初始化音频相关参数
+
+```c++
+void Vad::init();
+```
+
+### loadAudio 函数
+
+加载音频
+
+```c++
+void Vad::loadAudio(const std::string& wavPath)
+```
+
+**参数**
+
+> * **wavPath**(str): 音频文件路径
+
+### Predict 函数
+
+用于开始模型推理
+
+```c++
+bool Vad::Predict();
+```
+
+### getResult 函数
+
+**用于获取推理结果**
+
+```c++
+std::vector<std::map<std::string, float>> Vad::getResult(
+            float removeThreshold = 1.6, float expandHeadThreshold = 0.32, float expandTailThreshold = 0,
+            float mergeThreshold = 0.3);
+```
+
+**参数**
+
+> * **removeThreshold**(float): 丢弃结果片段阈值；部分识别结果太短则根据此阈值丢弃
+> * **expandHeadThreshold**(float): 结果片段开始时刻偏移；识别到的开始时刻可能过于贴近发声部分，因此据此前移开始时刻
+> * **expandTailThreshold**(float): 结果片段结束时刻偏移；识别到的结束时刻可能过于贴近发声部分，因此据此后移结束时刻
+> * **mergeThreshold**(float): 有的结果片段十分靠近，可以合并成一个，据此合并发声片段
+
+**输出结果格式为**`std::vector<std::map<std::string, float>>`
+
+> 输出一个列表，每个元素是一个讲话片段
+>
+> 每个片段可以用 'start' 获取到开始时刻，用 'end' 获取到结束时刻
+
+### 提示
+
+1. `setAudioCofig`函数必须在`init`函数前调用
+2. 输入的音频文件的采样率必须与代码中设置的保持一致
+
+- [模型介绍](../)
+- [如何切换模型推理后端引擎](../../../../docs/cn/faq/how_to_change_backend.md)
diff --git a/examples/audio/silero-vad/cpp/infer_onnx_silero_vad.cc b/examples/audio/silero-vad/cpp/infer_onnx_silero_vad.cc
@@ -0,0 +1,29 @@
+#include <iostream>
+
+#include "vad.h"
+
+int main(int argc, char* argv[]) {
+  if (argc < 3) {
+    std::cout << "Usage: infer_onnx_silero_vad path/to/model path/to/audio "
+                 "run_option, "
+                 "e.g ./infer_onnx_silero_vad silero_vad.onnx sample.wav"
+              << std::endl;
+    return -1;
+  }
+
+  std::string model_file = argv[1];
+  std::string audio_file = argv[2];
+
+  Vad vad(model_file);
+  // custom config, but must be set before init
+  // vad.setAudioCofig(16000, 64, 0.5f, 0, 0);
+  vad.init();
+  vad.loadAudio(audio_file);
+  vad.Predict();
+  std::vector<std::map<std::string, float>> result = vad.getResult();
+  for (auto& res : result) {
+    std::cout << "speak start: " << res["start"] << " s, end: " << res["end"]
+              << " s" << std::endl;
+  }
+  return 0;
+}