Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
2bb8969
Initial commit
abhinavg4 Sep 30, 2025
e023786
Fix few issues for bridge export (#738)
yaoyu-33 Sep 30, 2025
f9aad1f
chore: Add issue template for model requests (#826)
ko3n1g Sep 30, 2025
a73c1be
ci: Skip if `docs-only` label is attached (#833)
ko3n1g Oct 1, 2025
6db2d13
destroy process group at end of performance script (#772)
ananthsub Oct 1, 2025
ad151f3
ci(fix): pre-flight (#842)
ko3n1g Oct 1, 2025
4bba0e6
[docs] Add canonical lora docs (#821)
ananthsub Oct 2, 2025
7e2eeaa
ci: Bump pre-flight (#854)
ko3n1g Oct 2, 2025
a6cfa88
Gemma model provider + bridge (#394)
ananthsub Oct 2, 2025
1990938
[docs] Packed sequences (#822)
ananthsub Oct 2, 2025
a4912e7
Gemma2 provider + Bridge (#856)
ananthsub Oct 2, 2025
af6bc36
[docs] placeholder page for performance summary (#796)
ananthsub Oct 2, 2025
c149b2e
[checkpoint] save `latest_checkpointed_iteration.txt` for megatron-lm…
ananthsub Oct 3, 2025
bd9465e
fix: exit profiler context (#841)
ananthsub Oct 3, 2025
ad94387
support async saving for CI end to end testing (#804)
ananthsub Oct 3, 2025
ae707eb
ci: Run install check on self-hosted cpu runners (#857)
chtruong814 Oct 3, 2025
a5d7c58
docs: Revert 0.2.0 push (#865)
ko3n1g Oct 3, 2025
5d194b9
Remove model providers for different model sizes (Qwen, Llama) (#607)
yaoyu-33 Oct 3, 2025
96e7b4c
add tests for functor design
ananthsub Sep 26, 2025
4a750dd
improve typing for forward step func and add tests for functors
ananthsub Sep 27, 2025
e0e8611
update tests
ananthsub Sep 27, 2025
7f6ec50
make checks more robust
ananthsub Sep 27, 2025
d6b02c6
docstrings
ananthsub Sep 27, 2025
897da83
docstrings
ananthsub Sep 27, 2025
b7ad487
docstrings
ananthsub Sep 27, 2025
a6ae7a3
fix tests
ananthsub Sep 27, 2025
6883596
inject state once at the beginning of the loops
ananthsub Oct 3, 2025
23e9efc
cleanup
ananthsub Oct 3, 2025
ab4f32d
add tests
ananthsub Oct 3, 2025
ca2a3c5
Add pretraining script for Llama3 8B model with YAML and CLI configur…
abhinavg4 Oct 5, 2025
db1b812
Merge branch 'functor' of https://github.com/ananthsub/Megatron-Bridg…
abhinavg4 Oct 6, 2025
7a701f6
diffusion_energon_datamodule
abhinavg4 Oct 6, 2025
914ff80
Refactor configuration handling and update model parameters
abhinavg4 Oct 6, 2025
0a3ae83
first commit
Oct 30, 2025
5d12fc9
update branch
Oct 30, 2025
8847968
workable code
Oct 30, 2025
09c9488
workable thd
Oct 31, 2025
992f836
clean up, remove all CP for sbhd, CP now is only for thd
Oct 31, 2025
aed722f
add example commands
Oct 31, 2025
b0a90e6
add example commands
Oct 31, 2025
713ab54
commit to use all Wan's components from DFM
Nov 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
name: Bug report
about: Create a report to help us improve the repository or project
title: ""
labels: bug
assignees: ''

---

**Describe the bug**

A clear and concise description of what the bug is.

**Steps/Code to reproduce bug**

Please list *minimal* steps or code snippet for us to be able to reproduce the bug.

A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.


**Expected behavior**

A clear and concise description of what you expected to happen.


**Additional context**

Add any other context about the problem here.
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
blank_issues_enabled: false

20 changes: 20 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
name: Feature request
about: Suggest an idea for this project
title: ""
labels: enhancement
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
31 changes: 31 additions & 0 deletions .github/ISSUE_TEMPLATE/model-support-request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
name: Model Support Request
about: Request conversion support and training recipes for a new model
title: "<Model name> Model Support"
labels: ''
assignees: ''

---

Add support for \<model name\> model:

**Please include a link to the model's HuggingFace repo**
HF repo:

**These checklist items are required for all models in Megatron Bridge**

- [ ] Model providers
- [ ] Model bridge for HF conversion
- [ ] Unit tests (config and bridge)
- [ ] Model conversion functional tests

**For flagship models, these items are also needed**

- [ ] Optimal pretraining recipe
- [ ] Optimal finetuning recipe
- [ ] Recipe unit tests
- [ ] Recipe functional tests
- [ ] End to end CI tests

**Additional context**
Add any other context or screenshots about the model request here.
2 changes: 1 addition & 1 deletion .github/workflows/build-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ on:

jobs:
pre-flight:
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.53.0
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.64.2

build-docs:
needs: [pre-flight]
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-test-publish-wheel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ permissions:

jobs:
pre-flight:
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.53.0
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.64.2

build-test-publish-wheel:
needs: [pre-flight]
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/cicd-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License
# limitations under the License.
name: CICD NeMo
on:
schedule:
Expand All @@ -31,7 +31,7 @@ permissions:

jobs:
pre-flight:
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.53.0
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.64.2

lint-check:
name: Lint check
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/copyright-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ on:

jobs:
pre-flight:
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.53.0
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.64.2

copyright-check:
needs: [pre-flight]
Expand Down
7 changes: 3 additions & 4 deletions .github/workflows/install-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,20 +26,19 @@ on:

jobs:
pre-flight:
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.53.0
uses: NVIDIA-NeMo/FW-CI-templates/.github/workflows/_cicd_preflight.yml@v0.64.2

pip-test-bare-metal:
needs: [pre-flight]
if: |
!(needs.pre-flight.outputs.docs_only == 'true'
|| needs.pre-flight.outputs.is_deployment_workflow == 'true')
runs-on: ${{ matrix.arch }}
name: Pip - Python${{ matrix.python-version }} - ${{ matrix.arch == 'ubuntu-latest' && 'AMD64/Linux' || 'ARM64/Darwin' }} - Bare Metal
runs-on: linux-amd64-cpu16
name: Pip - Python${{ matrix.python-version }} - AMD64/Linux - Bare Metal
container: ubuntu:24.04
strategy:
fail-fast: false
matrix:
arch: ["ubuntu-latest"]
python-version: ["3.10", "3.11", "3.12"]
steps:
- name: Checkout repository
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
project = "Megatron Bridge"
copyright = "2025, NVIDIA Corporation"
author = "NVIDIA Corporation"
release = "0.2.0"
release = "0.1.0"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand Down
2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
:hidden:

parallelisms.md
performance-summary.md
performance-guide.md
recipe-usage.md
```
Expand Down Expand Up @@ -37,6 +38,7 @@ training/attention-optimizations.md
training/activation-recomputation.md
training/cpu-offloading.md
training/peft.md
training/packed-sequences.md
```

```{toctree}
Expand Down
58 changes: 58 additions & 0 deletions docs/performance-summary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Performance

As part of the NVIDIA NeMo Framework, Megatron Bridge, provides optimal performance for training advanced generative AI models by incorporating the most recent training techniques, such as model parallelization, optimized attention mechanisms, and more, to achieve high training throughput.

This page provides performance benchmarks for large language models using Megatron-Bridge across different GPU systems and configurations.

## Nomenclature

- **GBS**: Global Batch Size
- **MBS**: Micro Batch Size
- **FSDP**: Fully Sharded Data Parallel
- FSDP = 1: use FSDP
- FSDP = 0: use DDP (Distributed Data Parallel)
- **TP**: Tensor Parallel Size
- **PP**: Pipeline Parallel Size
- **CP**: Context Parallel Size
- **VP**: Virtual Pipeline Parallel Size
- **EP**: Expert Parallel Size
- **GA**: Number of Gradient Accumulations

## Performance Metrics

Performance is measured using:
- **Tokens/sec/GPU**: Throughput per GPU
- **Model TFLOP/sec/GPU**: Model floating-point operations per second per GPU

```{contents}
:local:
:depth: 2
```

## Performance Summary for Large Language Models

Below are performance benchmarks for various large language models organized by release version. These results were obtained using performance recipes available [here](https://github.com/NVIDIA/Megatron-Bridge/tree/main/scripts/performance).

The performance data includes:

- **Pre-training Performance**: Throughput metrics for various model sizes and architectures
- **System Configurations**: Results across different GPU systems (DGX-GB200, DGX-B200, DGX-H100)
- **Precision Options**: Performance comparisons between different precision modes (BF16, FP8, MXFP8)

---

## 25.09 NeMo Container

### Pre-Training Performance

#### System: DGX-GB200

*Performance tables will be added here*

#### System: DGX-B200

*Performance tables will be added here*

#### System: DGX-H100

*Performance tables will be added here*
5 changes: 4 additions & 1 deletion docs/project.json
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
{"name": "megatron-bridge", "version": "0.2.0"}
{
"name": "megatron-bridge",
"version": "0.1.0"
}
Binary file added docs/training/images/canonical_lora.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/training/images/performant_lora.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading