Skip to content

Conversation

@farhadrgh
Copy link
Collaborator

Description

NVIDIA-NeMo/NeMo#12856 introduces code reduction and perf improvements including standardizing input/output shapes for Hyena operators and consequentially reducing rearrangement overhead. This PR updates the EVO2 test to comply with those changes,

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels:

Note

By default, the notebooks validation tests are skipped unless explicitly enabled.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

  • If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
    automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
  • If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
    /ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Usage

TODO: Add code snippet

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

@copy-pr-bot
Copy link

copy-pr-bot bot commented Apr 2, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@farhadrgh farhadrgh force-pushed the farhadr/evo2_cleanup branch from fca02bc to 58706fe Compare April 2, 2025 19:20
@farhadrgh
Copy link
Collaborator Author

/ok to test

1 similar comment
@farhadrgh
Copy link
Collaborator Author

/ok to test

@farhadrgh farhadrgh force-pushed the farhadr/evo2_cleanup branch from 4c5ac7d to c14f433 Compare April 2, 2025 21:06
@cspades
Copy link
Member

cspades commented Apr 8, 2025

LGTM but will let John verify:

- features = rearrange(features, "l b d -> b l d").contiguous()
+ features = rearrange(features, "l b d -> b d l").contiguous()

@farhadrgh
Copy link
Collaborator Author

/ok to test

@codecov-commenter
Copy link

codecov-commenter commented Apr 9, 2025

Codecov Report

Attention: Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.

Project coverage is 84.42%. Comparing base (3936231) to head (1550e85).

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...packages/bionemo-llm/src/bionemo/llm/model/loss.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #798      +/-   ##
==========================================
+ Coverage   84.37%   84.42%   +0.05%     
==========================================
  Files         138      138              
  Lines        8690     8686       -4     
==========================================
+ Hits         7332     7333       +1     
+ Misses       1358     1353       -5     
Files with missing lines Coverage Δ
...kages/bionemo-evo2/src/bionemo/evo2/run/predict.py 79.59% <100.00%> (ø)
...onemo/geneformer/model/finetune_token_regressor.py 60.48% <100.00%> (+0.16%) ⬆️
...packages/bionemo-llm/src/bionemo/llm/model/loss.py 60.00% <75.00%> (+1.46%) ⬆️

... and 1 file with indirect coverage changes

Copy link
Collaborator

@jstjohn jstjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved but see my comment in line about manual verification of tensor parallel correctness. Ideally the same could be done for CP=2, but I am not 100% that we have that working in the predict script.

@farhadrgh
Copy link
Collaborator Author

Need to bump NeMo to get the changes in NVIDIA-NeMo/NeMo#12988 after its merged

@farhadrgh
Copy link
Collaborator Author

/ok to test b950c28

@farhadrgh farhadrgh enabled auto-merge April 14, 2025 16:23
@farhadrgh
Copy link
Collaborator Author

6e23633 changes from #807

@farhadrgh
Copy link
Collaborator Author

/ok to test 6e23633

@farhadrgh
Copy link
Collaborator Author

/ok to test a395a8b

@farhadrgh
Copy link
Collaborator Author

/ok to test 5f7f9ed

@farhadrgh
Copy link
Collaborator Author

/ok to test f25057b

@farhadrgh farhadrgh added this pull request to the merge queue Apr 16, 2025
@pstjohn pstjohn removed this pull request from the merge queue due to a manual request Apr 16, 2025
pstjohn and others added 8 commits May 5, 2025 09:12
Previously if pre-commit failed, it would cause `run-tests` to be
skipped, which would then mean that `verify-tests-status` would give the
PR the green light. That's obviously a problem, so we need to make sure
all these tests are added to the verify-tests-status check. Ideally we
could put some more logic in that if block to check if we're in a merge
queue, whether tests were intentionally skipped, etc.

---------

Signed-off-by: Peter St. John <[email protected]>
Signed-off-by: Farhad Ramezanghorbani <[email protected]>
### Description
<!-- Provide a detailed description of the changes in this PR -->

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

Signed-off-by: Jonathan Mitchell <[email protected]>
Signed-off-by: Farhad Ramezanghorbani <[email protected]>
### Description
<!-- Provide a detailed description of the changes in this PR -->

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

Signed-off-by: Jonathan Mitchell <[email protected]>
Signed-off-by: Farhad Ramezanghorbani <[email protected]>
PEFT checkpointing and inference for esm2.

---------

Signed-off-by: Polina Binder <[email protected]>
Signed-off-by: polinabinder1 <[email protected]>
Signed-off-by: Farhad Ramezanghorbani <[email protected]>
### Description
Profiling for LoRA additions to ESM2.

---------

Signed-off-by: Polina Binder <[email protected]>
Signed-off-by: Farhad Ramezanghorbani <[email protected]>
### Description
Fixes a recent CRIT vulnerability in h11

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

Signed-off-by: Timur Rvachov <[email protected]>
Signed-off-by: Farhad Ramezanghorbani <[email protected]>
### Description

### For the Geneformer documentation:

1. **Capitalization standardization**:
- Fixed capitalization of "BioNeMo", "Geneformer", "HuggingFace",
"ReLU", "BERT MLM"
- Corrected spelling of "Crohn's disease" (previously "Chron's disease")
   - Fixed "children" (previously "chidlren")

2. **Formatting improvements**:
   - Properly formatted model version bullet points with nesting
   - Added proper headings for property categories
   - Fixed displayed values (e.g., ".5M" → "0.5M")
- Standardized formatting of data collection/labeling methods sections

3. **Image captions**:
- Replaced low-quality image captions with descriptive, properly
formatted titles
   - Made chart descriptions more professional and consistent

4. **Grammatical improvements**:
   - Fixed article usage and punctuation
   - Improved sentence structure and clarity
   - Fixed section headings capitalization and consistency

5. **Fixed broken notes**:
   - Corrected `!! note` to `!!! note` for proper rendering

### For the ESM-2 pretraining documentation:

1. **Grammar and clarity improvements**:
   - Fixed article usage ("a ESM-2" → "an ESM-2")
   - Fixed formatting of numeric values (e.g., "1." → "1.0")
   - Fixed typos ("depreciation" → "deprecation")
   - Fixed "trainiing" → "training"

2. **Consistency in terminology**:
   - Standardized "BioNeMo" capitalization
   - Ensured consistent treatment of "ESM-2" references

3. **Structure and formatting**:
   - Improved spacing and paragraph breaks
   - Fixed section formatting and readability

### For the training-models documentation:

1. **Capitalization and consistency**:
   - Standardized capitalization of model sizes (8M, 650M, 3B)
   - Fixed capitalization of "ESM2", "Geneformer", "Python", "YAML"
   - Changed "WandB" to "Weights and Biases" consistently

2. **Formatting improvements**:
   - Changed code blocks consistently to include language tags
   - Added proper spacing and improved paragraph formatting
   - Fixed punctuation in lists and note sections

3. **Grammar and clarity**:
   - Added missing commas after introductory phrases
   - Fixed formatting of lists for better readability
   - Made bulleted explanations more consistent


### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [x]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

---------

Signed-off-by: Timur Rvachov <[email protected]>
Signed-off-by: Timur Rvachov <[email protected]>
Co-authored-by: lvojtku <[email protected]>
Signed-off-by: Farhad Ramezanghorbani <[email protected]>
### Description
<!-- Provide a detailed description of the changes in this PR -->

- Fixes the error caused by NVIDIA-NeMo/NeMo#12459
refactoring the definition of `masked_token_loss` and
`masked_token_loss_context_parallel` into a single function with a
`cp_size` argument that no longer divides the loss by the number of
"valid" (i.e. non-masked) tokens. So it returns a CP-reduced loss sum.
- Specifically, this breaks one of our golden value tests in
`bionemo-llm`:
`sub-packages/bionemo-llm/tests/bionemo/llm/model/test_loss.py::test_loss_equivalency_bionemo_vs_pytorch`,
and this fixes it with no behavior change to the LLM model `forward()`,
i.e. we perform the normalization on valid tokens on our side now.

### Details

- Bump NeMo to a version greater than:
NVIDIA-NeMo/NeMo#12856 or matching this:
#798
- Update: Need to migrate to `inference_context` in NeMo:
https://github.com/NVIDIA/NeMo/tree/cye/hyena-gpt-infer-context
- Bump Megatron to support new imports in the NeMo bump. Found a commit
that bisects the new Megatron inference engine and the new NeMo imports
to prevent breakage of our inference tests.
- Use a backend version of RoPE for the Amplify Megatron vs. PyTorch/HF
parity test to avoid the CP process group requirement.
- `MaskedTokenLossReduction.forward()` return API changed.
- Added commentary for future devs to understand the code.

#### Appendix

- NeMo Fork Hotfix Patch: Safe import of a future module in Megatron to
avoid upgrading.
```
get_gpt_heterogeneous_layer_spec, HAVE_GPT_HETEROGENEOUS = safe_import("megatron.core.models.gpt.heterogeneous.heterogeneous_layer_specs")
```

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [x]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### Usage / Testing
<!--- How does a user interact with the changed code -->

- Tested against the commit specified in this PR:
#798
```python
cd 3rdparty/NeMo
git checkout c998e273f9cd23e36d7348fa27d0c2692efd87c8
pytest -s sub-packages/bionemo-llm/tests/bionemo/llm/model/test_loss.py::test_loss_equivalency_bionemo_vs_pytorch
```

---------

Signed-off-by: Farhad Ramezanghorbani <[email protected]>
Signed-off-by: Cory Ye <[email protected]>
Signed-off-by: cspades <[email protected]>
Signed-off-by: Timur Rvachov <[email protected]>
Signed-off-by: Danny <[email protected]>
Signed-off-by: Cory Ye <[email protected]>
Signed-off-by: nvdreidenbach <[email protected]>
Signed-off-by: Peter St. John <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Polina Binder <[email protected]>
Signed-off-by: polinabinder1 <[email protected]>
Signed-off-by: dorotat <[email protected]>
Signed-off-by: Truong Nguyen <[email protected]>
Signed-off-by: Jonathan Mitchell <[email protected]>
Signed-off-by: Timur Rvachov <[email protected]>
Signed-off-by: Steven <[email protected]>
Co-authored-by: Farhad Ramezanghorbani <[email protected]>
Co-authored-by: Farhad Ramezanghorbani <[email protected]>
Co-authored-by: Dorota Toczydlowska <[email protected]>
Co-authored-by: Timur Rvachov <[email protected]>
Co-authored-by: nvdreidenbach <[email protected]>
Co-authored-by: Steven Kothen-Hill <[email protected]>
Co-authored-by: Peter St. John <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: polinabinder1 <[email protected]>
Co-authored-by: Truong Nguyen <[email protected]>
Co-authored-by: jomitchellnv <[email protected]>
Co-authored-by: lvojtku <[email protected]>
Signed-off-by: Farhad Ramezanghorbani <[email protected]>
@farhadrgh
Copy link
Collaborator Author

/ok to test 917c9c4

Signed-off-by: Farhad Ramezanghorbani <[email protected]>
@farhadrgh
Copy link
Collaborator Author

/ok to test 550e852b3f148a0212bdd5a10a5909a8f947c32

@copy-pr-bot
Copy link

copy-pr-bot bot commented May 5, 2025

/ok to test 550e852b3f148a0212bdd5a10a5909a8f947c32

@farhadrgh, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

@farhadrgh
Copy link
Collaborator Author

/ok to test 1550e85

@farhadrgh farhadrgh enabled auto-merge May 5, 2025 16:52
@farhadrgh farhadrgh added this pull request to the merge queue May 5, 2025
Merged via the queue into main with commit 6ab0afc May 5, 2025
10 checks passed
@farhadrgh farhadrgh deleted the farhadr/evo2_cleanup branch May 5, 2025 19:09
dorotat-nv added a commit that referenced this pull request May 6, 2025
### Description

NVIDIA-NeMo/NeMo#12856 introduces code reduction and
perf improvements including standardizing input/output shapes for Hyena
operators and consequentially reducing rearrangement overhead. This PR
updates the EVO2 test to comply with those changes,

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

---------

Signed-off-by: Farhad Ramezanghorbani <[email protected]>
Signed-off-by: Cory Ye <[email protected]>
Signed-off-by: cspades <[email protected]>
Signed-off-by: Timur Rvachov <[email protected]>
Signed-off-by: Danny <[email protected]>
Signed-off-by: Cory Ye <[email protected]>
Signed-off-by: nvdreidenbach <[email protected]>
Signed-off-by: Peter St. John <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Polina Binder <[email protected]>
Signed-off-by: polinabinder1 <[email protected]>
Signed-off-by: dorotat <[email protected]>
Signed-off-by: Truong Nguyen <[email protected]>
Signed-off-by: Jonathan Mitchell <[email protected]>
Signed-off-by: Timur Rvachov <[email protected]>
Signed-off-by: Steven <[email protected]>
Co-authored-by: Dorota Toczydlowska <[email protected]>
Co-authored-by: Cory Ye <[email protected]>
Co-authored-by: Timur Rvachov <[email protected]>
Co-authored-by: nvdreidenbach <[email protected]>
Co-authored-by: Steven Kothen-Hill <[email protected]>
Co-authored-by: Peter St. John <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: polinabinder1 <[email protected]>
Co-authored-by: Truong Nguyen <[email protected]>
Co-authored-by: jomitchellnv <[email protected]>
Co-authored-by: lvojtku <[email protected]>
Signed-off-by: dorotat <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request May 8, 2025
…d inference. (#861)

### Description
<!-- Provide a detailed description of the changes in this PR -->

- #798
(#855) depends on a NeMo
branch, which has been merged into NeMo `main`:
NVIDIA-NeMo/NeMo#13436. Update to point to this trunk
commit.

### Details

- NeMo ToT reverted the `cp_size` argument for `masked_token_loss`
(NVIDIA-NeMo/NeMo#13295), so we do the CP reduction
on our side now...
- Future Megatron bump will add the `* cp_size` multiplier to the loss,
and break our inference unit tests due to `torch.inference_mode()` usage
in Megatron.

---------

Signed-off-by: Cory Ye <[email protected]>
camirr-nv pushed a commit that referenced this pull request Jun 26, 2025
### Description

NVIDIA-NeMo/NeMo#12856 introduces code reduction and
perf improvements including standardizing input/output shapes for Hyena
operators and consequentially reducing rearrangement overhead. This PR
updates the EVO2 test to comply with those changes,

### Type of changes
<!-- Mark the relevant option with an [x] -->

- [ ]  Bug fix (non-breaking change which fixes an issue)
- [ ]  New feature (non-breaking change which adds functionality)
- [ ]  Refactor
- [ ]  Documentation update
- [ ]  Other (please describe):

### CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:

-
[SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci)
- Skip all continuous integration tests
-
[INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests)
- Execute notebook validation tests in pytest
-
[INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests)
- Execute tests labelled as slow in pytest for extensive testing

> [!NOTE]
> By default, the notebooks validation tests are skipped unless
explicitly enabled.

#### Authorizing CI Runs

We use
[copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation)
to manage authorization of CI
runs on NVIDIA's compute resources.

* If a pull request is opened by a trusted user and contains only
trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source
repository (e.g. pull-request/123)
* If a pull request is opened by an untrusted user or contains untrusted
changes, an NVIDIA org member must leave an
`/ok to test` comment on the pull request to trigger CI. This will need
to be done for each new commit.

### Usage
<!--- How does a user interact with the changed code -->
```python
TODO: Add code snippet
```

### Pre-submit Checklist
<!--- Ensure all items are completed before submitting -->

 - [ ] I have tested these changes locally
 - [ ] I have updated the documentation accordingly
 - [ ] I have added/updated tests as needed
 - [ ] All existing tests pass successfully

---------

Signed-off-by: Farhad Ramezanghorbani <[email protected]>
Signed-off-by: Cory Ye <[email protected]>
Signed-off-by: cspades <[email protected]>
Signed-off-by: Timur Rvachov <[email protected]>
Signed-off-by: Danny <[email protected]>
Signed-off-by: Cory Ye <[email protected]>
Signed-off-by: nvdreidenbach <[email protected]>
Signed-off-by: Peter St. John <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Polina Binder <[email protected]>
Signed-off-by: polinabinder1 <[email protected]>
Signed-off-by: dorotat <[email protected]>
Signed-off-by: Truong Nguyen <[email protected]>
Signed-off-by: Jonathan Mitchell <[email protected]>
Signed-off-by: Timur Rvachov <[email protected]>
Signed-off-by: Steven <[email protected]>
Co-authored-by: Dorota Toczydlowska <[email protected]>
Co-authored-by: Cory Ye <[email protected]>
Co-authored-by: Timur Rvachov <[email protected]>
Co-authored-by: nvdreidenbach <[email protected]>
Co-authored-by: Steven Kothen-Hill <[email protected]>
Co-authored-by: Peter St. John <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: polinabinder1 <[email protected]>
Co-authored-by: Truong Nguyen <[email protected]>
Co-authored-by: jomitchellnv <[email protected]>
Co-authored-by: lvojtku <[email protected]>
Signed-off-by: Ubuntu <[email protected]>
camirr-nv pushed a commit that referenced this pull request Jun 26, 2025
…d inference. (#861)

### Description
<!-- Provide a detailed description of the changes in this PR -->

- #798
(#855) depends on a NeMo
branch, which has been merged into NeMo `main`:
NVIDIA-NeMo/NeMo#13436. Update to point to this trunk
commit.

### Details

- NeMo ToT reverted the `cp_size` argument for `masked_token_loss`
(NVIDIA-NeMo/NeMo#13295), so we do the CP reduction
on our side now...
- Future Megatron bump will add the `* cp_size` multiplier to the loss,
and break our inference unit tests due to `torch.inference_mode()` usage
in Megatron.

---------

Signed-off-by: Cory Ye <[email protected]>
Signed-off-by: Ubuntu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.