Skip to content

Commit 2919fed

Browse files
Cherry-pick 2.4.1 changelog to r2.4.0 (#14843)
Signed-off-by: Charlie Truong <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent 6489229 commit 2919fed

File tree

1 file changed

+40
-23
lines changed

1 file changed

+40
-23
lines changed

CHANGELOG.md

Lines changed: 40 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,56 @@
11
# Changelog
22

33
<!-- Next changelog -->
4+
## NVIDIA Neural Modules 2.4.1
5+
6+
### Detailed Changelogs:
7+
8+
#### Uncategorized:
9+
10+
<details><summary>Changelog</summary>
11+
12+
- Update package_info.py by @ko3n1g :: PR: #14400
13+
- Patch to address issue 14392 by @youngeunkwon0405 :: PR: #14398
14+
- Cherry pick `Fix callbacks in DSV3 script (14350)` into `r2.4.0` by @chtruong814 :: PR: #14370
15+
- Cherry pick `Change Llama Embedding Tutorial to use SFT by default (14231)` into `r2.4.0` by @chtruong814 :: PR: #14303
16+
- Cherrypick `calculate_per_token_loss requirement for context parallel` (#14065) (#14282) into `r2.4.0` by @chtruong814 :: PR: #14448
17+
- Pin nvidia-lm-eval to 25.6.1 by @chtruong814 :: PR: #14470
18+
19+
</details>
20+
421
## NVIDIA Neural Modules 2.4.0
522

623
### Highlights
724

825
- Collections:
9-
- Speech
10-
- Batched beam search for transducers (RNN-T and TDT)
11-
- RNNT/TDT buffered/streaming inference \+ batched decoding support in cache-aware
12-
- add support for CTC batched beam search with GPU-LM
13-
- Key fixes
14-
- Punctuation Marks in Timestamps
15-
- Fix timestamps when cuda graphs enabled
16-
- Fix masking of \<pad\> tokens in AED inference
26+
- Speech
27+
- Batched beam search for transducers (RNN-T and TDT)
28+
- RNNT/TDT buffered/streaming inference \+ batched decoding support in cache-aware
29+
- add support for CTC batched beam search with GPU-LM
30+
- Key fixes
31+
- Punctuation Marks in Timestamps
32+
- Fix timestamps when cuda graphs enabled
33+
- Fix masking of \<pad\> tokens in AED inference
1734
- TDT streaming inference fix
1835
- LLM
19-
- Qwen 3 235B-A22B Perf Optimized
20-
- DeepSeek V3 Perf Optimized
21-
- Gemma3 support from Google
22-
- Embedding and Reranker models
36+
- Qwen 3 235B-A22B Perf Optimized
37+
- DeepSeek V3 Perf Optimized
38+
- Gemma3 support from Google
39+
- Embedding and Reranker models
2340
- MM
24-
- Llama 4
41+
- Llama 4
2542
- AVLM
26-
- Training performance (speed)
27-
- NVL sharp \+ IB sharp for DP/FSDP-communications on H100 and B200
28-
- MXFP8 with TP communication overlap
29-
- MXFP8 with reduced memory allocation
30-
- FP8 sub-channel recipe (128x128 for weight and 1x128 for activation)
31-
- cudnn fused attention for MLA (both Hopper and Blackwell)
32-
- Advanced custom asymmetric pipelining (for MTP, loss func, and embd)
33-
- BF16 optimizer for model memory saving
34-
- CUDA graph fix for fine-tuning benchmarks
43+
- Training performance (speed)
44+
- NVL sharp \+ IB sharp for DP/FSDP-communications on H100 and B200
45+
- MXFP8 with TP communication overlap
46+
- MXFP8 with reduced memory allocation
47+
- FP8 sub-channel recipe (128x128 for weight and 1x128 for activation)
48+
- cudnn fused attention for MLA (both Hopper and Blackwell)
49+
- Advanced custom asymmetric pipelining (for MTP, loss func, and embd)
50+
- BF16 optimizer for model memory saving
51+
- CUDA graph fix for fine-tuning benchmarks
3552
- CUDA graph support for LLAMA4
36-
53+
3754
### Detailed Changelogs
3855

3956
#### ASR

0 commit comments

Comments
 (0)