Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
47d34b8
add vad example
chenqianhe Jan 10, 2023
180a088
Merge branch 'develop' of https://github.com/chenqianhe/FastDeploy in…
chenqianhe Jan 10, 2023
b9d6b25
fix typo
chenqianhe Jan 10, 2023
cf482b6
Merge branch 'develop' into develop
chenqianhe Jan 10, 2023
82f4268
Merge branch 'develop' into develop
DefTruth Jan 11, 2023
af0d4f1
fix typo
chenqianhe Jan 11, 2023
356bbf4
rename file
chenqianhe Jan 11, 2023
8071ed3
Merge branch 'PaddlePaddle:develop' into develop
chenqianhe Jan 11, 2023
62a9c1b
remove model and wav
chenqianhe Jan 11, 2023
35bde94
delete Vad.cc
chenqianhe Jan 11, 2023
ae0ed89
delete Vad.h
chenqianhe Jan 11, 2023
92dfd56
rename and format
chenqianhe Jan 11, 2023
7d733a1
fix max and min
chenqianhe Jan 11, 2023
fdee845
update readme
chenqianhe Jan 11, 2023
230caa3
Merge branch 'develop' into develop
DefTruth Jan 11, 2023
3201b1f
rename var
chenqianhe Jan 12, 2023
fb2d028
Merge branch 'develop' of https://github.com/chenqianhe/FastDeploy in…
chenqianhe Jan 12, 2023
5d9e145
Merge branch 'develop' into develop
DefTruth Jan 12, 2023
326ed7e
format
chenqianhe Jan 12, 2023
253aa3b
Merge branch 'develop' of https://github.com/chenqianhe/FastDeploy in…
chenqianhe Jan 12, 2023
ca99e77
add params
chenqianhe Jan 12, 2023
60cfb3e
update readme
chenqianhe Jan 12, 2023
8cfdbf0
update readme
chenqianhe Jan 12, 2023
23f6391
Merge branch 'develop' into develop
chenqianhe Jan 12, 2023
a4883a3
Merge branch 'develop' into develop
chenqianhe Jan 12, 2023
d52d66c
Merge branch 'develop' into develop
chenqianhe Jan 13, 2023
592905d
Update README.md
chenqianhe Jan 13, 2023
3d123f3
Update README_CN.md
chenqianhe Jan 13, 2023
64f5a86
Merge branch 'develop' into develop
DefTruth Jan 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions examples/audio/silero-vad/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
English | [简体中文](README_CN.md)

# Silero VAD - pre-trained enterprise-grade Voice Activity Detector

The deployment model comes from [silero-vad](https://github.com/snakers4/silero-vad)

![](https://user-images.githubusercontent.com/36505480/198026365-8da383e0-5398-4a12-b7f8-22c2c0059512.png)

## Key Features

* Stellar accuracy

Silero VAD has excellent results on speech detection tasks.

* Fast

One audio chunk (30+ ms) takes less than 1ms to be processed on a single CPU thread. Using batching or GPU can also improve performance considerably.

* General

Silero VAD was trained on huge corpora that include over 100 languages and it performs well on audios from different domains with various background noise and quality levels.

* Flexible sampling rate

Silero VAD supports 8000 Hz and 16000 Hz sampling rates.

## Download Pre-trained ONNX Model

For developers' testing, model exported by VAD are provided below. Developers can download them directly.

| 模型 | 大小 | 备注 |
| :----------------------------------------------------------- | :---- | :----------------------------------------------------------- |
| [silero-vad](https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad.tgz) | 1.8MB | This model file is sourced from [snakers4/silero-vad](https://github.com/snakers4/silero-vad),MIT License |

## Detailed Deployment Documents

- [C++ deployment](cpp)

## Source

[https://github.com/snakers4/silero-vad](https://github.com/snakers4/silero-vad)
40 changes: 40 additions & 0 deletions examples/audio/silero-vad/README_CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
简体中文 | [English](README.md)

# Silero VAD 预训练的企业级语音活动检测器

该部署模型来自于 [silero-vad](https://github.com/snakers4/silero-vad)

![](https://user-images.githubusercontent.com/36505480/198026365-8da383e0-5398-4a12-b7f8-22c2c0059512.png)

## 主要特征

* 高准确率

Silero VAD在语音检测任务上有着优异的成绩。

* 快速推理

一个音频块(30+ 毫秒)在单个 CPU 线程上处理时间不到 1毫秒。

* 通用性

Silero VAD 在包含100多种语言的庞大语料库上进行了训练,它在来自不同领域、具有不同背景噪音和质量水平的音频上表现良好。

* 灵活采样率

Silero VAD支持 8000 Hz和16000 Hz 采样率。

## 下载预训练ONNX模型

为了方便开发者的测试,下面提供了 VAD 导出模型,开发者可直接下载使用。
| 模型 | 大小 | 备注 |
| :----------------------------------------------------------- | :---- | :----------------------------------------------------------- |
| [silero-vad](https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad.tgz) | 1.8MB | 此模型文件来源于[snakers4/silero-vad](https://github.com/snakers4/silero-vad),MIT License |

## 详细部署文档

- [C++ 部署](cpp)

## 模型来源

[https://github.com/snakers4/silero-vad](https://github.com/snakers4/silero-vad)
17 changes: 17 additions & 0 deletions examples/audio/silero-vad/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
cmake_minimum_required(VERSION 3.23)
project(silero_vad)

set(CMAKE_CXX_STANDARD 11)

# 指定下载解压后的fastdeploy库路径
option(FASTDEPLOY_INSTALL_DIR "Path of downloaded fastdeploy sdk.")

include(${FASTDEPLOY_INSTALL_DIR}/FastDeploy.cmake)

# 添加FastDeploy依赖头文件
include_directories(${FASTDEPLOY_INCS})

add_executable(infer_onnx_silero_vad ${PROJECT_SOURCE_DIR}/infer_onnx_silero_vad.cc wav.h vad.cc vad.h)

# 添加FastDeploy库依赖
target_link_libraries(infer_onnx_silero_vad ${FASTDEPLOY_LIBS})
121 changes: 121 additions & 0 deletions examples/audio/silero-vad/cpp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
English | [简体中文](README_CN.md)

# Silero VAD Deployment Example

This directory provides examples that `infer_onnx_silero_vad` fast finishes the deployment of VAD models on CPU/GPU.

Before deployment, two steps require confirmation.

- 1. Software and hardware should meet the requirements. Please refer to [FastDeploy Environment Requirements](../../../../docs/en/build_and_install/download_prebuilt_libraries.md).
- 2. Download the precompiled deployment library and samples code according to your development environment. Refer to [FastDeploy Precompiled Library](../../../../docs/en/build_and_install/download_prebuilt_libraries.md).

Taking VAD inference on Linux as an example, the compilation test can be completed by executing the following command in this directory.

```bash
mkdir build
cd build
# Download the FastDeploy precompiled library. Users can choose your appropriate version in the `FastDeploy Precompiled Library` mentioned above
wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-x.x.x.tgz
tar xvf fastdeploy-linux-x64-x.x.x.tgz
cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-x.x.x
make -j

# Download the VAD model file and test audio. After decompression, place the model and test audio in the infer_onnx_silero_vad.cc peer directory
wget https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad.tgz
wget https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad_sample.wav

# inference
./infer_onnx_silero_vad ../silero_vad.onnx ../silero_vad_sample.wav
```

- The above command works for Linux or MacOS. Refer to:
- [How to use FastDeploy C++ SDK in Windows](../../../../docs/en/faq/use_sdk_on_windows.md) for SDK use-pattern in Windows

## VAD C++ Interface

### Vad Class

```c++
Vad::Vad(const std::string& model_file,
const fastdeploy::RuntimeOption& custom_option = fastdeploy::RuntimeOption())
```

**Parameter**

> * **model_file**(str): Model file path
> * **runtime_option**(RuntimeOption): Backend inference configuration. None by default. (use the default configuration)

### setAudioCofig function

**Must be called before the `init` function**

```c++
void Vad::setAudioCofig(int sr, int frame_ms, float threshold, int min_silence_duration_ms, int speech_pad_ms);
```

**Parameter**

> * **sr**(int): sampling rate
> * **frame_ms**(int): The length of each detection frame, and it is used to calculate the detection window size
> * **threshold**(float): Result probability judgment threshold
> * **min_silence_duration_ms**(int): The threshold used to calculate whether it is silence
> * **speech_pad_ms**(int): Used to calculate the end time of the speech

### init function

Used to initialize audio-related parameters.

```c++
void Vad::init();
```

### loadAudio function

Load audio.

```c++
void Vad::loadAudio(const std::string& wavPath)
```

**Parameter**

> * **wavPath**(str): Audio file path

### Predict function

Used to start model reasoning.

```c++
bool Vad::Predict();
```

### getResult function

**Used to obtain reasoning results**

```c++
std::vector<std::map<std::string, float>> Vad::getResult(
float removeThreshold = 1.6, float expandHeadThreshold = 0.32, float expandTailThreshold = 0,
float mergeThreshold = 0.3);
```

**Parameter**

> * **removeThreshold**(float): Discard result fragment threshold; If some recognition results are too short, they will be discarded according to this threshold
> * **expandHeadThreshold**(float): Offset at the beginning of the segment; The recognized start time may be too close to the voice part, so move forward the start time accordingly
> * **expandTailThreshold**(float): Offset at the end of the segment; The recognized end time may be too close to the voice part, so the end time is moved back accordingly
> * **mergeThreshold**(float): Some result segments are very close and can be combined into one, and the vocal segments can be combined accordingly

**The output result format is**`std::vector<std::map<std::string, float>>`

> Output a list, each element is a speech fragment
>
> Each clip can use 'start' to get the start time and 'end' to get the end time

### Tips

1. `The setAudioCofig`function must be called before the `init` function
2. The sampling rate of the input audio file must be consistent with that set in the code

- [Model Description](../)
- [How to switch the model inference backend engine](../../../../docs/en/faq/how_to_change_backend.md)
119 changes: 119 additions & 0 deletions examples/audio/silero-vad/cpp/README_CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
[English](README.md) | 简体中文
# Silero VAD 部署示例

本目录下提供`infer_onnx_silero_vad`快速完成 Silero VAD 模型在CPU/GPU。

在部署前,需确认以下两个步骤

- 1. 软硬件环境满足要求,参考[FastDeploy环境要求](../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)
- 2. 根据开发环境,下载预编译部署库和samples代码,参考[FastDeploy预编译库](../../../../docs/cn/build_and_install/download_prebuilt_libraries.md)

以Linux上 VAD 推理为例,在本目录执行如下命令即可完成编译测试。

```bash
mkdir build
cd build
# 下载FastDeploy预编译库,用户可在上文提到的`FastDeploy预编译库`中自行选择合适的版本使用
wget https://bj.bcebos.com/fastdeploy/release/cpp/fastdeploy-linux-x64-x.x.x.tgz
tar xvf fastdeploy-linux-x64-x.x.x.tgz
cmake .. -DFASTDEPLOY_INSTALL_DIR=${PWD}/fastdeploy-linux-x64-x.x.x
make -j

# 下载 VAD 模型文件和测试音频,解压后将模型和测试音频放置在与 infer_onnx_silero_vad.cc 同级目录下
wget https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad.tgz
wget https://bj.bcebos.com/paddlehub/fastdeploy/silero_vad_sample.wav

# 推理
./infer_onnx_silero_vad ../silero_vad.onnx ../silero_vad_sample.wav
```

以上命令只适用于Linux或MacOS, Windows下SDK的使用方式请参考:
- [如何在Windows中使用FastDeploy C++ SDK](../../../../docs/cn/faq/use_sdk_on_windows.md)

## VAD C++ 接口
### Vad 类

```c++
Vad::Vad(const std::string& model_file,
const fastdeploy::RuntimeOption& custom_option = fastdeploy::RuntimeOption())
```

**参数**

> * **model_file**(str): 模型文件路径
> * **runtime_option**(RuntimeOption): 后端推理配置,默认为None,即采用默认配置

### setAudioCofig 函数

**必须在`init`函数前调用**

```c++
void Vad::setAudioCofig(int sr, int frame_ms, float threshold, int min_silence_duration_ms, int speech_pad_ms);
```

**参数**

> * **sr**(int): 采样率
> * **frame_ms**(int): 每次检测帧长,用于计算检测窗口大小
> * **threshold**(float): 结果概率判断阈值
> * **min_silence_duration_ms**(int): 用于计算判断是否是 silence 的阈值
> * **speech_pad_ms**(int): 用于计算 speach 结束时刻

### init 函数

用于初始化音频相关参数

```c++
void Vad::init();
```

### loadAudio 函数

加载音频

```c++
void Vad::loadAudio(const std::string& wavPath)
```

**参数**

> * **wavPath**(str): 音频文件路径

### Predict 函数

用于开始模型推理

```c++
bool Vad::Predict();
```

### getResult 函数

**用于获取推理结果**

```c++
std::vector<std::map<std::string, float>> Vad::getResult(
float removeThreshold = 1.6, float expandHeadThreshold = 0.32, float expandTailThreshold = 0,
float mergeThreshold = 0.3);
```

**参数**

> * **removeThreshold**(float): 丢弃结果片段阈值;部分识别结果太短则根据此阈值丢弃
> * **expandHeadThreshold**(float): 结果片段开始时刻偏移;识别到的开始时刻可能过于贴近发声部分,因此据此前移开始时刻
> * **expandTailThreshold**(float): 结果片段结束时刻偏移;识别到的结束时刻可能过于贴近发声部分,因此据此后移结束时刻
> * **mergeThreshold**(float): 有的结果片段十分靠近,可以合并成一个,据此合并发声片段

**输出结果格式为**`std::vector<std::map<std::string, float>>`

> 输出一个列表,每个元素是一个讲话片段
>
> 每个片段可以用 'start' 获取到开始时刻,用 'end' 获取到结束时刻

### 提示

1. `setAudioCofig`函数必须在`init`函数前调用
2. 输入的音频文件的采样率必须与代码中设置的保持一致

- [模型介绍](../)
- [如何切换模型推理后端引擎](../../../../docs/cn/faq/how_to_change_backend.md)
29 changes: 29 additions & 0 deletions examples/audio/silero-vad/cpp/infer_onnx_silero_vad.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#include <iostream>

#include "vad.h"

int main(int argc, char* argv[]) {
if (argc < 3) {
std::cout << "Usage: infer_onnx_silero_vad path/to/model path/to/audio "
"run_option, "
"e.g ./infer_onnx_silero_vad silero_vad.onnx sample.wav"
<< std::endl;
return -1;
}

std::string model_file = argv[1];
std::string audio_file = argv[2];

Vad vad(model_file);
// custom config, but must be set before init
// vad.setAudioCofig(16000, 64, 0.5f, 0, 0);
vad.init();
vad.loadAudio(audio_file);
vad.Predict();
std::vector<std::map<std::string, float>> result = vad.getResult();
for (auto& res : result) {
std::cout << "speak start: " << res["start"] << " s, end: " << res["end"]
<< " s" << std::endl;
}
return 0;
}
Loading