forked from NVIDIA-NeMo/Megatron-Bridge
-
Notifications
You must be signed in to change notification settings - Fork 1
Huvu/mcore wan official #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
huvunvidia
wants to merge
41
commits into
abhinavg4:main
Choose a base branch
from
huvunvidia:huvu/mcore_wan_official
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* fix cpu init during export Signed-off-by: yaoyu-33 <[email protected]> * export env fix Signed-off-by: yaoyu-33 <[email protected]> * delete_extra_state for TE related during checkpoint loading for export Signed-off-by: yaoyu-33 <[email protected]> * paths fixes Signed-off-by: yaoyu-33 <[email protected]> * add override_provider option for checkpoint loading Signed-off-by: yaoyu-33 <[email protected]> * add unit test for override_provider option Signed-off-by: yaoyu-33 <[email protected]> * remove debug lines Signed-off-by: yaoyu-33 <[email protected]> * lint Signed-off-by: yaoyu-33 <[email protected]> * unit test fix Signed-off-by: yaoyu-33 <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]>
* chore: Add issue template for model requests Signed-off-by: oliver könig <[email protected]> * copying over remaining templates Signed-off-by: oliver könig <[email protected]> --------- Signed-off-by: oliver könig <[email protected]>
* ci: Skip if `docs-only` label is attached Signed-off-by: oliver könig <[email protected]> * test Signed-off-by: oliver könig <[email protected]> * test Signed-off-by: oliver könig <[email protected]> * test Signed-off-by: oliver könig <[email protected]> * update Signed-off-by: oliver könig <[email protected]> --------- Signed-off-by: oliver könig <[email protected]>
* cleanup process group at end of performance script Signed-off-by: Ananth Subramaniam <[email protected]> * Update scripts/performance/run_script.py Signed-off-by: Ananth Subramaniam <[email protected]> * destroy pg for other scripts Signed-off-by: Ananth Subramaniam <[email protected]> * update Signed-off-by: Ananth Subramaniam <[email protected]> --------- Signed-off-by: Ananth Subramaniam <[email protected]> Signed-off-by: Ananth Subramaniam <[email protected]>
* ci(fix): pre-flight Signed-off-by: oliver könig <[email protected]> * test Signed-off-by: oliver könig <[email protected]> * test Signed-off-by: oliver könig <[email protected]> * final Signed-off-by: oliver könig <[email protected]> --------- Signed-off-by: oliver könig <[email protected]>
Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: oliver könig <[email protected]>
* initial gemma commit Signed-off-by: Ananth Subramaniam <[email protected]> * gemma provider Signed-off-by: Ananth Subramaniam <[email protected]> * patch tests Signed-off-by: Ananth Subramaniam <[email protected]> * add gemma bridge + tests Signed-off-by: Ananth Subramaniam <[email protected]> * fix conftest Signed-off-by: Ananth Subramaniam <[email protected]> * reenable msc Signed-off-by: Ananth Subramaniam <[email protected]> * fix gemma test fallback Signed-off-by: Ananth Subramaniam <[email protected]> * try simpler tokenizer Signed-off-by: Ananth Subramaniam <[email protected]> * upload assets Signed-off-by: Ananth Subramaniam <[email protected]> * use pre-downloaded config for model provider test Signed-off-by: Ananth Subramaniam <[email protected]> * lint Signed-off-by: Ananth Subramaniam <[email protected]> * address feedback -s Signed-off-by: Ananth Subramaniam <[email protected]> * rebase Signed-off-by: Ananth Subramaniam <[email protected]> * rebase Signed-off-by: Ananth Subramaniam <[email protected]> * use mcore activations Signed-off-by: Ananth Subramaniam <[email protected]> * update test Signed-off-by: Ananth Subramaniam <[email protected]> * fix mock Signed-off-by: Ananth Subramaniam <[email protected]> * fix conversion script reference Signed-off-by: Ananth Subramaniam <[email protected]> * subclass Signed-off-by: Ananth Subramaniam <[email protected]> * update tests Signed-off-by: Ananth Subramaniam <[email protected]> --------- Signed-off-by: Ananth Subramaniam <[email protected]>
* [docs] packed sequences Signed-off-by: Ananth Subramaniam <[email protected]> * [docs] packed sequences Signed-off-by: Ananth Subramaniam <[email protected]> * address feedback Signed-off-by: Ananth Subramaniam <[email protected]> --------- Signed-off-by: Ananth Subramaniam <[email protected]>
* gemma2 provider and bridge Signed-off-by: Ananth Subramaniam <[email protected]> * gemma2 model provider + bridge Signed-off-by: Ananth Subramaniam <[email protected]> --------- Signed-off-by: Ananth Subramaniam <[email protected]>
* docs] placeholder page for performance summary Signed-off-by: Ananth Subramaniam <[email protected]> * add sections for releases Signed-off-by: Ananth Subramaniam <[email protected]> * improve description Signed-off-by: Ananth Subramaniam <[email protected]> --------- Signed-off-by: Ananth Subramaniam <[email protected]>
… compatibility (NVIDIA-NeMo#829) * save latest_checkpointed_iteration for compatibility Signed-off-by: Ananth Subramaniam <[email protected]> * fix megatron fsdp test assertion Signed-off-by: Ananth Subramaniam <[email protected]> --------- Signed-off-by: Ananth Subramaniam <[email protected]>
* exit profiler context Signed-off-by: Ananth Subramaniam <[email protected]> * disable vocab size logging in flops calculation Signed-off-by: Ananth Subramaniam <[email protected]> --------- Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Ananth Subramaniam <[email protected]>
* Clear disk space before install check Signed-off-by: Charlie Truong <[email protected]> * Revert "Clear disk space before install check" This reverts commit 2c085f5. Signed-off-by: Charlie Truong <[email protected]> * Run bare metal install on self-hosted runners Signed-off-by: Charlie Truong <[email protected]> --------- Signed-off-by: Charlie Truong <[email protected]>
Signed-off-by: oliver könig <[email protected]>
…A-NeMo#607) * update llama and qwen models to use auto bridge and update recipes test as well Signed-off-by: yaoyu-33 <[email protected]> * temporary remove llama4 as it's not fully tested or verified. Signed-off-by: yaoyu-33 <[email protected]> * Revert "temporary remove llama4 as it's not fully tested or verified." This reverts commit 5217084. * temp save Signed-off-by: yaoyu-33 <[email protected]> * temp save Signed-off-by: yaoyu-33 <[email protected]> * Revert "temp save" This reverts commit 0c57e2b. * Revert "temp save" This reverts commit 0748d52. * update qwen's recipes Signed-off-by: yaoyu-33 <[email protected]> * update llama recipes Signed-off-by: yaoyu-33 <[email protected]> * remove some old recipe files Signed-off-by: yaoyu-33 <[email protected]> * update recipe files to match old recipes Signed-off-by: yaoyu-33 <[email protected]> * update recipe file Signed-off-by: yaoyu-33 <[email protected]> * update qwen recipes Signed-off-by: yaoyu-33 <[email protected]> * update llama recipes Signed-off-by: yaoyu-33 <[email protected]> * Update src/megatron/bridge/recipes/qwen/qwen3.py Co-authored-by: Ananth Subramaniam <[email protected]> Signed-off-by: Yu Yao <[email protected]> * Update src/megatron/bridge/recipes/qwen/qwen3.py Co-authored-by: Ananth Subramaniam <[email protected]> Signed-off-by: Yu Yao <[email protected]> * Update src/megatron/bridge/recipes/qwen/qwen3.py Co-authored-by: Ananth Subramaniam <[email protected]> Signed-off-by: Yu Yao <[email protected]> * Update src/megatron/bridge/recipes/llama/llama2.py Co-authored-by: Ananth Subramaniam <[email protected]> Signed-off-by: Yu Yao <[email protected]> * Update src/megatron/bridge/recipes/llama/llama2.py Co-authored-by: Ananth Subramaniam <[email protected]> Signed-off-by: Yu Yao <[email protected]> * recipe naming update Signed-off-by: yaoyu-33 <[email protected]> * update test Signed-off-by: yaoyu-33 <[email protected]> * lint Signed-off-by: yaoyu-33 <[email protected]> * add TypedDict for args Signed-off-by: yaoyu-33 <[email protected]> * lint Signed-off-by: yaoyu-33 <[email protected]> * update docstring Signed-off-by: yaoyu-33 <[email protected]> * unit test fix and license fix Signed-off-by: yaoyu-33 <[email protected]> * sync eval_interval and save_interval Signed-off-by: yaoyu-33 <[email protected]> * add comments Signed-off-by: yaoyu-33 <[email protected]> * set TRANSFORMERS_OFFLINE=1 in action.yml Signed-off-by: yaoyu-33 <[email protected]> * fix llama3 8b hf model path Signed-off-by: yaoyu-33 <[email protected]> * replay lr decay iters update on updated recipes Signed-off-by: yaoyu-33 <[email protected]> * Update action.yml Signed-off-by: Yu Yao <[email protected]> * add comments Signed-off-by: yaoyu-33 <[email protected]> * Add guard / mock for the places needs to download hf config in unit test Signed-off-by: yaoyu-33 <[email protected]> * lint Signed-off-by: yaoyu-33 <[email protected]> * add qwen functional test Signed-off-by: yaoyu-33 <[email protected]> * update recipe tests Signed-off-by: yaoyu-33 <[email protected]> * lint Signed-off-by: yaoyu-33 <[email protected]> --------- Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Yu Yao <[email protected]> Co-authored-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Ananth Subramaniam <[email protected]>
Signed-off-by: Ananth Subramaniam <[email protected]>
…ation support - Introduced `pretrain_DiT_Model.py` for flexible pretraining using Megatron-Bridge. - Updated `DITForwardStep` class to use `__call__` method for forward steps. - Modified dataset configuration in `pretrain_config` to utilize `DiffusionDataModule`. - Adjusted tensor and context parallelism settings in `llama3_8b.py`. This commit enhances the pretraining capabilities and configuration flexibility for Llama3 models.
- Commented out sections in `pretrain_DiT_Model.py` related to OmegaConf merging and command-line overrides for clarity. - Added `backend` configuration in `llama3_8b_pretrain_override_example.yaml`. - Updated `init_global_step` handling in `EnergonMultiModalDataModule` to simplify initialization. - Introduced `DiffusionDataModuleConfig` for better dataset configuration management. - Adjusted model parameters in `llama_provider.py` to set `num_layers` to 2 and added `seq_length` and `vocab_size` attributes in `DiTModelProvider`. - Refined imports across various modules to ensure consistency and clarity. This commit enhances the configuration structure and model initialization process, improving maintainability and usability.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.