Skip to content

Commit ed94226

Browse files
author
Yibing Liu
authored
Merge pull request #2423 from kuke/add_ctc_decoder_design_doc
update ctc_beam_search_decoder design doc
2 parents 75a7399 + de1a701 commit ed94226

File tree

2 files changed

+14
-1
lines changed

2 files changed

+14
-1
lines changed
Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,19 @@ TODO by Assignees
140140

141141
### Beam Search with CTC and LM
142142

143-
TODO by Assignees
143+
<div align="center">
144+
<img src="image/beam_search.png" width=600><br/>
145+
Figure 2. Algorithm for CTC Beam Search Decoder.
146+
</div>
147+
148+
- The **Beam Search Decoder** for DS2 CTC-trained network follows the similar approach in \[[3](#references)\] as shown in Figure 2, with two important modifications for the ambiguous parts:
149+
- 1) in the iterative computation of probabilities, the assignment operation is changed to accumulation for one prefix may comes from different paths;
150+
- 2) the if condition ```if l^+ not in A_prev then``` after probabilities' computation is deprecated for it is hard to understand and seems unnecessary.
151+
- An **external scorer** would be passed into the decoder to evaluate a candidate prefix during decoding whenever a white space appended in English decoding and any character appended in Mandarin decoding.
152+
- Such external scorer consists of language model, word count or any other custom scorers.
153+
- The **language model** is built from Task 5, with parameters should be carefully tuned to achieve minimum WER/CER (c.f. Task 7)
154+
- This decoder needs to perform with **high efficiency** for the convenience of parameters tuning and speech recognition in reality.
155+
144156

145157
## Future Work
146158

@@ -153,3 +165,4 @@ TODO by Assignees
153165

154166
1. Dario Amodei, etc., [Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin](http://proceedings.mlr.press/v48/amodei16.pdf). ICML 2016.
155167
2. Dario Amodei, etc., [Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin](https://arxiv.org/abs/1512.02595). arXiv:1512.02595.
168+
3. Awni Y. Hannun, etc. [First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs](https://arxiv.org/abs/1408.2873). arXiv:1408.2873
464 KB
Loading

0 commit comments

Comments
 (0)