-
Notifications
You must be signed in to change notification settings - Fork 97
Update EVO2 tests according to Hyena arch changes #798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fca02bc to
58706fe
Compare
|
/ok to test |
1 similar comment
|
/ok to test |
4c5ac7d to
c14f433
Compare
|
LGTM but will let John verify: |
|
/ok to test |
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## main #798 +/- ##
==========================================
+ Coverage 84.37% 84.42% +0.05%
==========================================
Files 138 138
Lines 8690 8686 -4
==========================================
+ Hits 7332 7333 +1
+ Misses 1358 1353 -5
|
jstjohn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved but see my comment in line about manual verification of tensor parallel correctness. Ideally the same could be done for CP=2, but I am not 100% that we have that working in the predict script.
|
Need to bump NeMo to get the changes in NVIDIA-NeMo/NeMo#12988 after its merged |
|
/ok to test b950c28 |
|
/ok to test 6e23633 |
|
/ok to test a395a8b |
|
/ok to test 5f7f9ed |
|
/ok to test f25057b |
Previously if pre-commit failed, it would cause `run-tests` to be skipped, which would then mean that `verify-tests-status` would give the PR the green light. That's obviously a problem, so we need to make sure all these tests are added to the verify-tests-status check. Ideally we could put some more logic in that if block to check if we're in a merge queue, whether tests were intentionally skipped, etc. --------- Signed-off-by: Peter St. John <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>
### Description <!-- Provide a detailed description of the changes in this PR --> ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Jonathan Mitchell <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>
### Description <!-- Provide a detailed description of the changes in this PR --> ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Jonathan Mitchell <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>
PEFT checkpointing and inference for esm2. --------- Signed-off-by: Polina Binder <[email protected]> Signed-off-by: polinabinder1 <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>
### Description Profiling for LoRA additions to ESM2. --------- Signed-off-by: Polina Binder <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>
### Description Fixes a recent CRIT vulnerability in h11 ### Type of changes <!-- Mark the relevant option with an [x] --> - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>
### Description
### For the Geneformer documentation:
1. **Capitalization standardization**:
- Fixed capitalization of "BioNeMo", "Geneformer", "HuggingFace",
"ReLU", "BERT MLM"
- Corrected spelling of "Crohn's disease" (previously "Chron's disease")
- Fixed "children" (previously "chidlren")
2. **Formatting improvements**:
- Properly formatted model version bullet points with nesting
- Added proper headings for property categories
- Fixed displayed values (e.g., ".5M" → "0.5M")
- Standardized formatting of data collection/labeling methods sections
3. **Image captions**:
- Replaced low-quality image captions with descriptive, properly
formatted titles
- Made chart descriptions more professional and consistent
4. **Grammatical improvements**:
- Fixed article usage and punctuation
- Improved sentence structure and clarity
- Fixed section headings capitalization and consistency
5. **Fixed broken notes**:
- Corrected `!! note` to `!!! note` for proper rendering
### For the ESM-2 pretraining documentation:
1. **Grammar and clarity improvements**:
- Fixed article usage ("a ESM-2" → "an ESM-2")
- Fixed formatting of numeric values (e.g., "1." → "1.0")
- Fixed typos ("depreciation" → "deprecation")
- Fixed "trainiing" → "training"
2. **Consistency in terminology**:
- Standardized "BioNeMo" capitalization
- Ensured consistent treatment of "ESM-2" references
3. **Structure and formatting**:
- Improved spacing and paragraph breaks
- Fixed section formatting and readability
### For the training-models documentation:
1. **Capitalization and consistency**:
- Standardized capitalization of model sizes (8M, 650M, 3B)
- Fixed capitalization of "ESM2", "Geneformer", "Python", "YAML"
- Changed "WandB" to "Weights and Biases" consistently
2. **Formatting improvements**:
- Changed code blocks consistently to include language tags
- Added proper spacing and improved paragraph formatting
- Fixed punctuation in lists and note sections
3. **Grammar and clarity**:
- Added missing commas after introductory phrases
- Fixed formatting of lists for better readability
- Made bulleted explanations more consistent
### Type of changes
<!-- Mark the relevant option with an [x] -->
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Refactor
- [x] Documentation update
- [ ] Other (please describe):
### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:
-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing
> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.
#### Authorizing CI Runs
We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.
* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.
### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```
### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->
- [ ] I have tested these changes locally
- [ ] I have updated the documentation accordingly
- [ ] I have added/updated tests as needed
- [ ] All existing tests pass successfully
---------
Signed-off-by: Timur Rvachov <[email protected]>
Signed-off-by: Timur Rvachov <[email protected]>
Co-authored-by: lvojtku <[email protected]>
Signed-off-by: Farhad Ramezanghorbani <[email protected]>
### Description <!-- Provide a detailed description of the changes in this PR --> - Fixes the error caused by NVIDIA-NeMo/NeMo#12459 refactoring the definition of `masked_token_loss` and `masked_token_loss_context_parallel` into a single function with a `cp_size` argument that no longer divides the loss by the number of "valid" (i.e. non-masked) tokens. So it returns a CP-reduced loss sum. - Specifically, this breaks one of our golden value tests in `bionemo-llm`: `sub-packages/bionemo-llm/tests/bionemo/llm/model/test_loss.py::test_loss_equivalency_bionemo_vs_pytorch`, and this fixes it with no behavior change to the LLM model `forward()`, i.e. we perform the normalization on valid tokens on our side now. ### Details - Bump NeMo to a version greater than: NVIDIA-NeMo/NeMo#12856 or matching this: #798 - Update: Need to migrate to `inference_context` in NeMo: https://github.com/NVIDIA/NeMo/tree/cye/hyena-gpt-infer-context - Bump Megatron to support new imports in the NeMo bump. Found a commit that bisects the new Megatron inference engine and the new NeMo imports to prevent breakage of our inference tests. - Use a backend version of RoPE for the Amplify Megatron vs. PyTorch/HF parity test to avoid the CP process group requirement. - `MaskedTokenLossReduction.forward()` return API changed. - Added commentary for future devs to understand the code. #### Appendix - NeMo Fork Hotfix Patch: Safe import of a future module in Megatron to avoid upgrading. ``` get_gpt_heterogeneous_layer_spec, HAVE_GPT_HETEROGENEOUS = safe_import("megatron.core.models.gpt.heterogeneous.heterogeneous_layer_specs") ``` ### Type of changes <!-- Mark the relevant option with an [x] --> - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### Usage / Testing <!--- How does a user interact with the changed code --> - Tested against the commit specified in this PR: #798 ```python cd 3rdparty/NeMo git checkout c998e273f9cd23e36d7348fa27d0c2692efd87c8 pytest -s sub-packages/bionemo-llm/tests/bionemo/llm/model/test_loss.py::test_loss_equivalency_bionemo_vs_pytorch ``` --------- Signed-off-by: Farhad Ramezanghorbani <[email protected]> Signed-off-by: Cory Ye <[email protected]> Signed-off-by: cspades <[email protected]> Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Danny <[email protected]> Signed-off-by: Cory Ye <[email protected]> Signed-off-by: nvdreidenbach <[email protected]> Signed-off-by: Peter St. John <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Polina Binder <[email protected]> Signed-off-by: polinabinder1 <[email protected]> Signed-off-by: dorotat <[email protected]> Signed-off-by: Truong Nguyen <[email protected]> Signed-off-by: Jonathan Mitchell <[email protected]> Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Steven <[email protected]> Co-authored-by: Farhad Ramezanghorbani <[email protected]> Co-authored-by: Farhad Ramezanghorbani <[email protected]> Co-authored-by: Dorota Toczydlowska <[email protected]> Co-authored-by: Timur Rvachov <[email protected]> Co-authored-by: nvdreidenbach <[email protected]> Co-authored-by: Steven Kothen-Hill <[email protected]> Co-authored-by: Peter St. John <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: polinabinder1 <[email protected]> Co-authored-by: Truong Nguyen <[email protected]> Co-authored-by: jomitchellnv <[email protected]> Co-authored-by: lvojtku <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>
271825c to
917c9c4
Compare
|
/ok to test 917c9c4 |
Signed-off-by: Farhad Ramezanghorbani <[email protected]>
|
/ok to test 550e852b3f148a0212bdd5a10a5909a8f947c32 |
@farhadrgh, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
|
/ok to test 1550e85 |
### Description NVIDIA-NeMo/NeMo#12856 introduces code reduction and perf improvements including standardizing input/output shapes for Hyena operators and consequentially reducing rearrangement overhead. This PR updates the EVO2 test to comply with those changes, ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: Farhad Ramezanghorbani <[email protected]> Signed-off-by: Cory Ye <[email protected]> Signed-off-by: cspades <[email protected]> Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Danny <[email protected]> Signed-off-by: Cory Ye <[email protected]> Signed-off-by: nvdreidenbach <[email protected]> Signed-off-by: Peter St. John <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Polina Binder <[email protected]> Signed-off-by: polinabinder1 <[email protected]> Signed-off-by: dorotat <[email protected]> Signed-off-by: Truong Nguyen <[email protected]> Signed-off-by: Jonathan Mitchell <[email protected]> Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Steven <[email protected]> Co-authored-by: Dorota Toczydlowska <[email protected]> Co-authored-by: Cory Ye <[email protected]> Co-authored-by: Timur Rvachov <[email protected]> Co-authored-by: nvdreidenbach <[email protected]> Co-authored-by: Steven Kothen-Hill <[email protected]> Co-authored-by: Peter St. John <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: polinabinder1 <[email protected]> Co-authored-by: Truong Nguyen <[email protected]> Co-authored-by: jomitchellnv <[email protected]> Co-authored-by: lvojtku <[email protected]> Signed-off-by: dorotat <[email protected]>
…d inference. (#861) ### Description <!-- Provide a detailed description of the changes in this PR --> - #798 (#855) depends on a NeMo branch, which has been merged into NeMo `main`: NVIDIA-NeMo/NeMo#13436. Update to point to this trunk commit. ### Details - NeMo ToT reverted the `cp_size` argument for `masked_token_loss` (NVIDIA-NeMo/NeMo#13295), so we do the CP reduction on our side now... - Future Megatron bump will add the `* cp_size` multiplier to the loss, and break our inference unit tests due to `torch.inference_mode()` usage in Megatron. --------- Signed-off-by: Cory Ye <[email protected]>
### Description NVIDIA-NeMo/NeMo#12856 introduces code reduction and perf improvements including standardizing input/output shapes for Hyena operators and consequentially reducing rearrangement overhead. This PR updates the EVO2 test to comply with those changes, ### Type of changes <!-- Mark the relevant option with an [x] --> - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: Farhad Ramezanghorbani <[email protected]> Signed-off-by: Cory Ye <[email protected]> Signed-off-by: cspades <[email protected]> Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Danny <[email protected]> Signed-off-by: Cory Ye <[email protected]> Signed-off-by: nvdreidenbach <[email protected]> Signed-off-by: Peter St. John <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Polina Binder <[email protected]> Signed-off-by: polinabinder1 <[email protected]> Signed-off-by: dorotat <[email protected]> Signed-off-by: Truong Nguyen <[email protected]> Signed-off-by: Jonathan Mitchell <[email protected]> Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Steven <[email protected]> Co-authored-by: Dorota Toczydlowska <[email protected]> Co-authored-by: Cory Ye <[email protected]> Co-authored-by: Timur Rvachov <[email protected]> Co-authored-by: nvdreidenbach <[email protected]> Co-authored-by: Steven Kothen-Hill <[email protected]> Co-authored-by: Peter St. John <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: polinabinder1 <[email protected]> Co-authored-by: Truong Nguyen <[email protected]> Co-authored-by: jomitchellnv <[email protected]> Co-authored-by: lvojtku <[email protected]> Signed-off-by: Ubuntu <[email protected]>
…d inference. (#861) ### Description <!-- Provide a detailed description of the changes in this PR --> - #798 (#855) depends on a NeMo branch, which has been merged into NeMo `main`: NVIDIA-NeMo/NeMo#13436. Update to point to this trunk commit. ### Details - NeMo ToT reverted the `cp_size` argument for `masked_token_loss` (NVIDIA-NeMo/NeMo#13295), so we do the CP reduction on our side now... - Future Megatron bump will add the `* cp_size` multiplier to the loss, and break our inference unit tests due to `torch.inference_mode()` usage in Megatron. --------- Signed-off-by: Cory Ye <[email protected]> Signed-off-by: Ubuntu <[email protected]>
Description
NVIDIA-NeMo/NeMo#12856 introduces code reduction and perf improvements including standardizing input/output shapes for Hyena operators and consequentially reducing rearrangement overhead. This PR updates the EVO2 test to comply with those changes,
Type of changes
CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:
Note
By default, the notebooks validation tests are skipped unless explicitly enabled.
Authorizing CI Runs
We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
/ok to testcomment on the pull request to trigger CI. This will need to be done for each new commit.Usage
Pre-submit Checklist