Skip to content

Commit 0bc58a7

Browse files
authored
add the English user guide of the Audio component (#730)
* Update README.md (#1) * Delete UserGuide-en.md (#2) * Update README.md (#3) * Update README.md (#4) * Update README-en.md (#5) * Update UserGuide-en.md (#6)
1 parent 83b5a7d commit 0bc58a7

2 files changed

Lines changed: 119 additions & 10 deletions

File tree

README-en.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@
1717
</p>
1818

1919
## Introduction
20-
VisualDL, a visualization analysis tool of PaddlePaddle, provides a variety of charts to show the trends of parameters, and visualizes model structures, data samples, histograms of tensors and high-dimensional data distributions. It enables users to understand the training process and the model structure more clearly and intuitively so as to optimize models efficiently.
20+
VisualDL, a visualization analysis tool of PaddlePaddle, provides a variety of charts to show the trends of parameters, and visualizes model structures, data samples, histograms of tensors, pr curves and high-dimensional data distributions. It enables users to understand the training process and the model structure more clearly and intuitively so as to optimize models efficiently.
2121

22-
VisualDL provides various visualization functions, including tracking metrics in real-time, visualizing the model structure, displaying the data sample, presenting the changes of distributions of tensors, projecting high-dimensional data to a lower dimensional space and more. For specific guidelines of each function, please refer to [**VisualDL User Guide**](./docs/components/UserGuide-en.md). Currently, VisualDL iterates rapidly and new functions will be continously added.
22+
VisualDL provides various visualization functions, including tracking metrics in real-time, visualizing the model structure, displaying the data sample, presenting the changes of distributions of tensors, showing the pr curves, projecting high-dimensional data to a lower dimensional space and more. For specific guidelines of each function, please refer to [**VisualDL User Guide**](./docs/components/UserGuide-en.md). Currently, VisualDL iterates rapidly and new functions will be continously added.
2323

2424
VisualDL natively supports the use of Python. Developers can retrieve plentiful visualization results by simply adding a few lines of Python code into the model before training.
2525

@@ -221,6 +221,13 @@ Developers can compare with multiple experiments by specifying and uploading the
221221
<img src="https://visualdl.bj.bcebos.com/images/image-eye.gif" width="60%"/>
222222
</p>
223223

224+
### Audio
225+
**Audio** aims to allow developers to listen to the audio data in real-time during the training process, helping developers to monitor the process of speech recognition and text-to-speech.
226+
227+
<p align="center">
228+
<img src="https://user-images.githubusercontent.com/48054808/88752564-d22ccf00-d18c-11ea-9711-7b5868986ba7.png" width="85%"/>
229+
</p>
230+
224231
### Graph
225232

226233
**Graph** enables developers to visualize model structures by only one click. Moreover, **Graph** allows Developers to explore model attributes, node information, node input and output. aiding them analyze model structure quickly and understand the direction of data flow easily.

docs/components/UserGuide-en.md

Lines changed: 110 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,19 @@
66

77
VisualDL is a visualization tool designed for Deep Learning. VisualDL provides a variety of charts to show the trends of parameters. It enables users to understand the training process and model structures of Deep Learning models more clearly and intuitively so as to optimize models efficiently.
88

9-
Currently, VisualDL provides six components: scalar, image, graph, histogram, pr curve and high dimensional. VisualDL iterates rapidly and new functions will be continuously added.
9+
Currently, VisualDL provides seven components: scalar, image, audio, graph, histogram, pr curve and high dimensional. VisualDL iterates rapidly and new functions will be continuously added.
1010

1111

1212

13-
| component name | display chart | function |
13+
| Component Name | Display Chart | Function |
1414
| :----------------------------------------------------------: | :---------------------------: | :----------------------------------------------------------- |
15-
| [ Scalar](#Scalar--Line-Chart) | line chart | Display scalar data such as loss and accuracy dynamically. |
16-
| [Image](#Image--Image-Visualization) | image visualization | Display images, visualizing the input and the output and making it easy to view the changes in the intermediate process. |
17-
| [Graph](#Graph--Network-Structure) | network structure | Visualize network structures, node attributes and data flow, assisting developers to learn and to optimize network structures. |
18-
| [Histogram](#Histogram--Distribution-of-Tensors) | distribution of tensors | Present the changes of distributions of tensors, such as weights/gradients/bias, during the training process. |
15+
| [ Scalar](#Scalar--Line-Chart) | Line Chart | Display scalar data such as loss and accuracy dynamically. |
16+
| [Image](#Image--Image-Visualization) | Image Visualization | Display images, visualizing the input and the output and making it easy to view the changes in the intermediate process. |
17+
| [Audio](#Audio--Audio-Play) | Audio Play | Play the audio during the training process, making it easy to monitor the process of speech recognition and text-to-speech. |
18+
| [Graph](#Graph--Network-Structure) | Network Structure | Visualize network structures, node attributes and data flow, assisting developers to learn and to optimize network structures. |
19+
| [Histogram](#Histogram--Distribution-of-Tensors) | Distribution of Tensors | Present the changes of distributions of tensors, such as weights/gradients/bias, during the training process. |
1920
| [PR Curve](#PR-曲线组件) | Precision & Recall Curve | Display precision-recall curves across training steps, clarifying the tradeoff between precision and recall when comparing models. |
20-
| [High Dimensional](#High-Dimensional--Data-Dimensionality-Reduction) | data dimensionality reduction | Project high-dimensional data into 2D/3D space for embedding visualization, making it convenient to observe the correlation between data. |
21+
| [High Dimensional](#High-Dimensional--Data-Dimensionality-Reduction) | Data Dimensionality Reduction | Project high-dimensional data into 2D/3D space for embedding visualization, making it convenient to observe the correlation between data. |
2122

2223

2324

@@ -159,7 +160,7 @@ Then, open the browser and enter the address: `http://127.0.0.1:8080` to view li
159160

160161

161162

162-
* Developers can find target images by searching corresponded image tags.
163+
* Developers can find target scalar charts by searching corresponded tags.
163164

164165
<p align="center">
165166
<img src="https://visualdl.bj.bcebos.com/images/scalar-searchlabel.png" width="90%"/>
@@ -183,6 +184,7 @@ Then, open the browser and enter the address: `http://127.0.0.1:8080` to view li
183184
<p align="center">
184185
<img src="https://visualdl.bj.bcebos.com/images/x-axis.png" width="40%"/>
185186
</p>
187+
186188
* The smoothness of the curve can be adjusted to better show the change of the overall trend.
187189

188190
<p align="center">
@@ -266,6 +268,106 @@ Then, open the browser and enter the address: `http://127.0.0.1:8080`to view:
266268
<img src="https://visualdl.bj.bcebos.com/images/image-eye.gif" width="60%"/>
267269
</p>
268270

271+
## Audio--Audio Play
272+
273+
### Introduction
274+
275+
Audio aims to allow developers to listen to the audio in real-time during the training process, helping developers to monitor the process of speech recognition and text-to-speech.
276+
277+
### Record Interface
278+
279+
The interface of the Image is shown as follows:
280+
281+
```python
282+
add_audio(tag, audio_array, step, sample_rate)
283+
```
284+
The interface parameters are described as follows:
285+
| parameter | format | meaning |
286+
| --------- | ------------- | ------------------------------------------------------------ |
287+
| tag | string | Record the name of the audio,e.g.audoi/sample. Notice that the name cannot contain `%` |
288+
| audio_arry | numpy.ndarray | Audio in ndarray format |
289+
| step | int | Record the training steps |
290+
| sample_rate | int | Sample rate,**Please note that the rate should be the rate of the original audio** |
291+
292+
### Demo
293+
The following shows an example of using Audio to record data, and the script can be found in [Audio Demo](https://github.com/PaddlePaddle/VisualDL/blob/develop/demo/components/audio_test.py).
294+
295+
```python
296+
from visualdl import LogWriter
297+
import numpy as np
298+
import wave
299+
300+
301+
def read_audio_data(audio_path):
302+
"""
303+
Get audio data.
304+
"""
305+
CHUNK = 4096
306+
f = wave.open(audio_path, "rb")
307+
wavdata = []
308+
chunk = f.readframes(CHUNK)
309+
while chunk:
310+
data = np.frombuffer(chunk, dtype='uint8')
311+
wavdata.extend(data)
312+
chunk = f.readframes(CHUNK)
313+
# 8k sample rate, 16bit frame, 1 channel
314+
shape = [8000, 2, 1]
315+
return shape, wavdata
316+
317+
318+
if __name__ == '__main__':
319+
with LogWriter(logdir="./log") as writer:
320+
audio_shape, audio_data = read_audio_data("./testing.wav")
321+
audio_data = np.array(audio_data)
322+
writer.add_audio(tag="audio_tag",
323+
audio_array=audio_data,
324+
step=0,
325+
sample_rate=8000)
326+
```
327+
After running the above program, developers can launch the panel by:
328+
```shell
329+
visualdl --logdir ./log --port 8080
330+
```
331+
332+
Then, open the browser and enter the address: `http://127.0.0.1:8080`to view:
333+
334+
<p align="center">
335+
<img src="https://user-images.githubusercontent.com/48054808/88753858-eaeab400-d18f-11ea-87c6-46ab7d5a5fd0.png" width="90%"/>
336+
</p>
337+
338+
### Functional Instructions
339+
340+
- Developers can find the target audio by searching corresponded tags.
341+
342+
<p align="center">
343+
<img src="https://user-images.githubusercontent.com/48054808/88755034-c6dca200-d192-11ea-8349-1414bcf9d38d.png" width="80%"/>
344+
</p>
345+
346+
- Developers are allowed to listen to the audio under different iterations by scrolling the Step/iteration slider.
347+
348+
<p align="center">
349+
<img src="https://user-images.githubusercontent.com/48054808/88755220-33f03780-d193-11ea-9b0f-a283d9f3a78a.png" width="40%"/>
350+
</p>
351+
352+
- Play/Pause the audio
353+
354+
<p align="center">
355+
<img src="https://user-images.githubusercontent.com/48054808/88755240-41a5bd00-d193-11ea-9780-7ae7c7792070.png" width="40%"/>
356+
</p>
357+
358+
- Adjust the volume
359+
360+
<p align="center">
361+
<img src="https://user-images.githubusercontent.com/48054808/88755258-53876000-d193-11ea-96b2-9ed698423202.png" width="40%"/>
362+
</p>
363+
364+
- Download the audio
365+
366+
<p align="center">
367+
<img src="https://user-images.githubusercontent.com/48054808/88755377-9a755580-d193-11ea-947e-4275b9d3aa54.png" width="40%"/>
368+
</p>
369+
370+
269371
## Graph--Network Structure
270372

271373
### Introduction

0 commit comments

Comments
 (0)