You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/design/speech/deep_speech_2.md
+14-1Lines changed: 14 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -140,7 +140,19 @@ TODO by Assignees
140
140
141
141
### Beam Search with CTC and LM
142
142
143
-
TODO by Assignees
143
+
<divalign="center">
144
+
<imgsrc="image/beam_search.png"width=600><br/>
145
+
Figure 2. Algorithm for CTC Beam Search Decoder.
146
+
</div>
147
+
148
+
- The **Beam Search Decoder** for DS2 CTC-trained network follows the similar approach in \[[3](#references)\] as shown in Figure 2, with two important modifications for the ambiguous parts:
149
+
-1) in the iterative computation of probabilities, the assignment operation is changed to accumulation for one prefix may comes from different paths;
150
+
-2) the if condition ```if l^+ not in A_prev then``` after probabilities' computation is deprecated for it is hard to understand and seems unnecessary.
151
+
- An **external scorer** would be passed into the decoder to evaluate a candidate prefix during decoding whenever a white space appended in English decoding and any character appended in Mandarin decoding.
152
+
- Such external scorer consists of language model, word count or any other custom scorers.
153
+
- The **language model** is built from Task 5, with parameters should be carefully tuned to achieve minimum WER/CER (c.f. Task 7)
154
+
- This decoder needs to perform with **high efficiency** for the convenience of parameters tuning and speech recognition in reality.
155
+
144
156
145
157
## Future Work
146
158
@@ -153,3 +165,4 @@ TODO by Assignees
153
165
154
166
1. Dario Amodei, etc., [Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin](http://proceedings.mlr.press/v48/amodei16.pdf). ICML 2016.
155
167
2. Dario Amodei, etc., [Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin](https://arxiv.org/abs/1512.02595). arXiv:1512.02595.
168
+
3. Awni Y. Hannun, etc. [First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs](https://arxiv.org/abs/1408.2873). arXiv:1408.2873
0 commit comments