NVIDIA-NeMo · lilithgrigoryan · Jul 1, 2025 · Jul 1, 2025 · Jul 1, 2025 · Jul 1, 2025
diff --git a/docs/source/asr/asr_customization/legacy_language_modeling_and_customization.rst b/docs/source/asr/asr_customization/legacy_language_modeling_and_customization.rst
diff --git a/docs/source/asr/asr_customization/neural_rescoring.rst b/docs/source/asr/asr_customization/neural_rescoring.rst
@@ -0,0 +1,105 @@
+.. _neural_rescoring:
+
+****************
+Neural Rescoring
+****************
+
+When using the neural rescoring approach, a neural network is used to score candidates. A candidate is the text transcript predicted by the ASR model's decoder. The top K candidates produced by beam search decoding (with a beam width of K) are given to a neural language model for ranking. The language model assigns a score to each candidate, which is usually combined with the scores from beam search decoding to produce the final scores and rankings.
+
+Train Neural Rescorer
+=====================
+
+An example script to train such a language model with Transformer can be found at `examples/nlp/language_modeling/transformer_lm.py <https://github.com/NVIDIA/NeMo/blob/stable/examples/nlp/language_modeling/transformer_lm.py>`__.
+It trains a ``TransformerLMModel`` which can be used as a neural rescorer for an ASR system. For more information on language models training, see LLM/NLP documentation.
+
+
+You can also use a pretrained language model from the Hugging Face library, such as Transformer-XL and GPT, instead of training your model.
+Models like BERT and RoBERTa are not supported by this script because they are trained as Masked Language Models. As a result, they are not efficient or effective for scoring sentences out of the box.
+
+
+Evaluation
+==========
+
+Given a trained TransformerLMModel `.nemo` file or a pretrained HF model, the script available at
+`scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py>`__
+can be used to re-score beams obtained with ASR model. You need the `.tsv` file containing the candidates produced
+by the acoustic model and the beam search decoding to use this script. The candidates can be the result of just the beam
+search decoding or the result of fusion with an N-gram LM. You can generate this file by specifying `--preds_output_folder` for
+`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py>`__.
+
+The neural rescorer would rescore the beams/candidates by using two parameters of `rescorer_alpha` and `rescorer_beta`, as follows:
+
+.. code-block::
+
+    final_score = beam_search_score + rescorer_alpha*neural_rescorer_score + rescorer_beta*seq_length
+
+The parameter `rescorer_alpha` specifies the importance placed on the neural rescorer model, while `rescorer_beta` is a penalty term that accounts for sequence length in the scores. These parameters have similar effects to `beam_alpha` and `beam_beta` in the beam search decoder and N-gram language model.
+
+Use the following steps to evaluate a neural LM:
+
+#. Obtain `.tsv` file with beams and their corresponding scores. Scores can be from a regular beam search decoder or
+   in fusion with an N-gram LM scores. For a given beam size `beam_size` and a number of examples
+   for evaluation `num_eval_examples`, it should contain (`num_eval_examples` x `beam_size`) lines of
+   form `beam_candidate_text \t score`. This file can be generated by `scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py>`__
+
+#. Rescore the candidates by `scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/neural_rescorer/eval_neural_rescorer.py>`__.
+
+.. code-block::
+
+    python eval_neural_rescorer.py
+        --lm_model=[path to .nemo file of the LM or the name of a HF pretrained model]
+        --beams_file=[path to beams .tsv file]
+        --beam_size=[size of the beams]
+        --eval_manifest=[path to eval manifest .json file]
+        --batch_size=[batch size used for inference on the LM model]
+        --alpha=[the value for the parameter rescorer_alpha]
+        --beta=[the value for the parameter rescorer_beta]
+        --scores_output_file=[the optional path to store the rescored candidates]
+
+The candidates, along with their new scores, are stored at the file specified by `--scores_output_file`.
+
+The following is the list of the arguments for the evaluation script:
+
++---------------------+--------+------------------+-------------------------------------------------------------------------+
+| **Argument**        |**Type**| **Default**      | **Description**                                                         |
++---------------------+--------+------------------+-------------------------------------------------------------------------+
+| lm_model            | str    | Required         | The path of the '.nemo' file of an ASR model, or the name of a          |
+|                     |        |                  | Hugging Face pretrained model like 'transfo-xl-wt103' or 'gpt2'.        |
++---------------------+--------+------------------+-------------------------------------------------------------------------+
+| eval_manifest       | str    | Required         | Path to the evaluation manifest file (.json manifest file).             |
++---------------------+--------+------------------+-------------------------------------------------------------------------+
+| beams_file          | str    | Required         | Path to beams file (.tsv) containing the candidates and their scores.   |
++---------------------+--------+------------------+-------------------------------------------------------------------------+
+| beam_size           | int    | Required         | The width of the beams (number of candidates) generated by the decoder. |
++---------------------+--------+------------------+-------------------------------------------------------------------------+
+| alpha               | float  | None             | The value for parameter rescorer_alpha                                  |
+|                     |        |                  | Not passing value would enable linear search for rescorer_alpha.        |
++---------------------+--------+------------------+-------------------------------------------------------------------------+
+| beta                | float  | None             | The value for parameter rescorer_beta                                   |
+|                     |        |                  | Not passing value would enable linear search for rescorer_beta.         |
++---------------------+--------+------------------+-------------------------------------------------------------------------+
+| batch_size          | int    | 16               | The batch size used to calculate the scores.                            |
++---------------------+--------+------------------+-------------------------------------------------------------------------+
+| max_seq_length      | int    | 512              | Maximum sequence length (in tokens) for the input.                      |
++---------------------+--------+------------------+-------------------------------------------------------------------------+
+| scores_output_file  | str    | None             | The optional file to store the rescored beams.                          |
++---------------------+--------+------------------+-------------------------------------------------------------------------+
+| use_amp             | bool   | ``False``        | Whether to use AMP if available calculate the scores.                   |
++---------------------+--------+------------------+-------------------------------------------------------------------------+
+| device              | str    | cuda             | The device to load LM model onto to calculate the scores                |
+|                     |        |                  | It can be 'cpu', 'cuda', 'cuda:0', 'cuda:1', ...                        |
++---------------------+--------+------------------+-------------------------------------------------------------------------+
+
+
+Hyperparameter Linear Search
+----------------------------
+
+The hyperparameter linear search script also supports linear search for parameters `alpha` and `beta`. If any of the two is not
+provided, a linear search is performed to find the best value for that parameter. When linear search is used, initially
+`beta` is set to zero and the best value for `alpha` is found, then `alpha` is fixed with
+that value and another linear search is done to find the best value for `beta`.
+If any of the of these two parameters is already specified, then search for that one is skipped. After each search for a
+parameter, the plot of WER% for different values of the parameter is also shown.
+
+It is recommended to first use the linear search for both parameters on a validation set by not providing any values for `--alpha` and `--beta`.
+Then check the WER curves and decide on the best values for each parameter. Finally, evaluate the best values on the test set.