Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

105 changes: 105 additions & 0 deletions docs/source/asr/asr_customization/neural_rescoring.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
.. _neural_rescoring:

****************
Neural Rescoring
****************

When using the neural rescoring approach, a neural network is used to score candidates. A candidate is the text transcript predicted by the ASR model's decoder. The top K candidates produced by beam search decoding (with a beam width of K) are given to a neural language model for ranking. The language model assigns a score to each candidate, which is usually combined with the scores from beam search decoding to produce the final scores and rankings.

Train Neural Rescorer
=====================

An example script to train such a language model with Transformer can be found at `examples/nlp/language_modeling/transformer_lm.py <https://github.com/NVIDIA/NeMo/blob/stable/examples/nlp/language_modeling/transformer_lm.py>`__.
It trains a ``TransformerLMModel`` which can be used as a neural rescorer for an ASR system. For more information on language models training, see LLM/NLP documentation.


You can also use a pretrained language model from the Hugging Face library, such as Transformer-XL and GPT, instead of training your model.
Models like BERT and RoBERTa are not supported by this script because they are trained as Masked Language Models. As a result, they are not efficient or effective for scoring sentences out of the box.


Evaluation
==========

Given a trained TransformerLMModel `.nemo` file or a pretrained HF model, the script available at
`scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py>`__
can be used to re-score beams obtained with ASR model. You need the `.tsv` file containing the candidates produced
by the acoustic model and the beam search decoding to use this script. The candidates can be the result of just the beam
search decoding or the result of fusion with an N-gram LM. You can generate this file by specifying `--preds_output_folder` for
`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py>`__.

The neural rescorer would rescore the beams/candidates by using two parameters of `rescorer_alpha` and `rescorer_beta`, as follows:

.. code-block::

final_score = beam_search_score + rescorer_alpha*neural_rescorer_score + rescorer_beta*seq_length

The parameter `rescorer_alpha` specifies the importance placed on the neural rescorer model, while `rescorer_beta` is a penalty term that accounts for sequence length in the scores. These parameters have similar effects to `beam_alpha` and `beam_beta` in the beam search decoder and N-gram language model.

Use the following steps to evaluate a neural LM:

#. Obtain `.tsv` file with beams and their corresponding scores. Scores can be from a regular beam search decoder or
in fusion with an N-gram LM scores. For a given beam size `beam_size` and a number of examples
for evaluation `num_eval_examples`, it should contain (`num_eval_examples` x `beam_size`) lines of
form `beam_candidate_text \t score`. This file can be generated by `scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py>`__

#. Rescore the candidates by `scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py>`__.

.. code-block::

python eval_neural_rescorer.py
--lm_model=[path to .nemo file of the LM or the name of a HF pretrained model]
--beams_file=[path to beams .tsv file]
--beam_size=[size of the beams]
--eval_manifest=[path to eval manifest .json file]
--batch_size=[batch size used for inference on the LM model]
--alpha=[the value for the parameter rescorer_alpha]
--beta=[the value for the parameter rescorer_beta]
--scores_output_file=[the optional path to store the rescored candidates]

The candidates, along with their new scores, are stored at the file specified by `--scores_output_file`.

The following is the list of the arguments for the evaluation script:

+---------------------+--------+------------------+-------------------------------------------------------------------------+
| **Argument** |**Type**| **Default** | **Description** |
+---------------------+--------+------------------+-------------------------------------------------------------------------+
| lm_model | str | Required | The path of the '.nemo' file of an ASR model, or the name of a |
| | | | Hugging Face pretrained model like 'transfo-xl-wt103' or 'gpt2'. |
+---------------------+--------+------------------+-------------------------------------------------------------------------+
| eval_manifest | str | Required | Path to the evaluation manifest file (.json manifest file). |
+---------------------+--------+------------------+-------------------------------------------------------------------------+
| beams_file | str | Required | Path to beams file (.tsv) containing the candidates and their scores. |
+---------------------+--------+------------------+-------------------------------------------------------------------------+
| beam_size | int | Required | The width of the beams (number of candidates) generated by the decoder. |
+---------------------+--------+------------------+-------------------------------------------------------------------------+
| alpha | float | None | The value for parameter rescorer_alpha |
| | | | Not passing value would enable linear search for rescorer_alpha. |
+---------------------+--------+------------------+-------------------------------------------------------------------------+
| beta | float | None | The value for parameter rescorer_beta |
| | | | Not passing value would enable linear search for rescorer_beta. |
+---------------------+--------+------------------+-------------------------------------------------------------------------+
| batch_size | int | 16 | The batch size used to calculate the scores. |
+---------------------+--------+------------------+-------------------------------------------------------------------------+
| max_seq_length | int | 512 | Maximum sequence length (in tokens) for the input. |
+---------------------+--------+------------------+-------------------------------------------------------------------------+
| scores_output_file | str | None | The optional file to store the rescored beams. |
+---------------------+--------+------------------+-------------------------------------------------------------------------+
| use_amp | bool | ``False`` | Whether to use AMP if available calculate the scores. |
+---------------------+--------+------------------+-------------------------------------------------------------------------+
| device | str | cuda | The device to load LM model onto to calculate the scores |
| | | | It can be 'cpu', 'cuda', 'cuda:0', 'cuda:1', ... |
+---------------------+--------+------------------+-------------------------------------------------------------------------+


Hyperparameter Linear Search
----------------------------

The hyperparameter linear search script also supports linear search for parameters `alpha` and `beta`. If any of the two is not
provided, a linear search is performed to find the best value for that parameter. When linear search is used, initially
`beta` is set to zero and the best value for `alpha` is found, then `alpha` is fixed with
that value and another linear search is done to find the best value for `beta`.
If any of the of these two parameters is already specified, then search for that one is skipped. After each search for a
parameter, the plot of WER% for different values of the parameter is also shown.

It is recommended to first use the linear search for both parameters on a validation set by not providing any values for `--alpha` and `--beta`.
Then check the WER curves and decide on the best values for each parameter. Finally, evaluate the best values on the test set.
Loading
Loading