Update EVO2 tests according to Hyena arch changes #798

farhadrgh · 2025-04-02T18:33:16Z

Description

NVIDIA-NeMo/NeMo#12856 introduces code reduction and perf improvements including standardizing input/output shapes for Hyena operators and consequentially reducing rearrangement overhead. This PR updates the EVO2 test to comply with those changes,

Type of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactor
Documentation update
Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels:

SKIP_CI - Skip all continuous integration tests
INCLUDE_NOTEBOOKS_TESTS - Execute notebook validation tests in pytest
INCLUDE_SLOW_TESTS - Execute tests labelled as slow in pytest for extensive testing

Note

By default, the notebooks validation tests are skipped unless explicitly enabled.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
/ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Usage

TODO: Add code snippet

Pre-submit Checklist

I have tested these changes locally
I have updated the documentation accordingly
I have added/updated tests as needed
All existing tests pass successfully

copy-pr-bot · 2025-04-02T18:33:19Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

farhadrgh · 2025-04-02T19:21:04Z

/ok to test

farhadrgh · 2025-04-02T20:01:58Z

/ok to test

cspades · 2025-04-08T15:14:17Z

LGTM but will let John verify:

- features = rearrange(features, "l b d -> b l d").contiguous()
+ features = rearrange(features, "l b d -> b d l").contiguous()

farhadrgh · 2025-04-09T20:01:01Z

/ok to test

codecov-commenter · 2025-04-09T20:50:32Z

Codecov Report

Attention: Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.

Project coverage is 84.42%. Comparing base (3936231) to head (1550e85).

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
...packages/bionemo-llm/src/bionemo/llm/model/loss.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #798      +/-   ##
==========================================
+ Coverage   84.37%   84.42%   +0.05%     
==========================================
  Files         138      138              
  Lines        8690     8686       -4     
==========================================
+ Hits         7332     7333       +1     
+ Misses       1358     1353       -5

Files with missing lines	Coverage Δ
...kages/bionemo-evo2/src/bionemo/evo2/run/predict.py	`79.59% <100.00%> (ø)`
...onemo/geneformer/model/finetune_token_regressor.py	`60.48% <100.00%> (+0.16%)`	⬆️
...packages/bionemo-llm/src/bionemo/llm/model/loss.py	`60.00% <75.00%> (+1.46%)`	⬆️

... and 1 file with indirect coverage changes

jstjohn

Approved but see my comment in line about manual verification of tensor parallel correctness. Ideally the same could be done for CP=2, but I am not 100% that we have that working in the predict script.

sub-packages/bionemo-evo2/tests/bionemo/evo2/test_hyena_operators.py

farhadrgh · 2025-04-11T17:20:53Z

Need to bump NeMo to get the changes in NVIDIA-NeMo/NeMo#12988 after its merged

farhadrgh · 2025-04-14T16:21:25Z

/ok to test b950c28

farhadrgh · 2025-04-14T17:48:10Z

6e23633 changes from #807

farhadrgh · 2025-04-14T17:48:40Z

/ok to test 6e23633

farhadrgh · 2025-04-15T14:20:01Z

/ok to test a395a8b

farhadrgh · 2025-04-16T15:34:52Z

/ok to test 5f7f9ed

farhadrgh · 2025-04-16T19:41:11Z

/ok to test f25057b

Previously if pre-commit failed, it would cause `run-tests` to be skipped, which would then mean that `verify-tests-status` would give the PR the green light. That's obviously a problem, so we need to make sure all these tests are added to the verify-tests-status check. Ideally we could put some more logic in that if block to check if we're in a merge queue, whether tests were intentionally skipped, etc. --------- Signed-off-by: Peter St. John <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>

### Description  ### Type of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage  ```python TODO: Add code snippet ``` ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Jonathan Mitchell <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>

PEFT checkpointing and inference for esm2. --------- Signed-off-by: Polina Binder <[email protected]> Signed-off-by: polinabinder1 <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>

### Description Profiling for LoRA additions to ESM2. --------- Signed-off-by: Polina Binder <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>

### Description Fixes a recent CRIT vulnerability in h11 ### Type of changes  - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage  ```python TODO: Add code snippet ``` ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>

### Description ### For the Geneformer documentation: 1. **Capitalization standardization**: - Fixed capitalization of "BioNeMo", "Geneformer", "HuggingFace", "ReLU", "BERT MLM" - Corrected spelling of "Crohn's disease" (previously "Chron's disease") - Fixed "children" (previously "chidlren") 2. **Formatting improvements**: - Properly formatted model version bullet points with nesting - Added proper headings for property categories - Fixed displayed values (e.g., ".5M" → "0.5M") - Standardized formatting of data collection/labeling methods sections 3. **Image captions**: - Replaced low-quality image captions with descriptive, properly formatted titles - Made chart descriptions more professional and consistent 4. **Grammatical improvements**: - Fixed article usage and punctuation - Improved sentence structure and clarity - Fixed section headings capitalization and consistency 5. **Fixed broken notes**: - Corrected `!! note` to `!!! note` for proper rendering ### For the ESM-2 pretraining documentation: 1. **Grammar and clarity improvements**: - Fixed article usage ("a ESM-2" → "an ESM-2") - Fixed formatting of numeric values (e.g., "1." → "1.0") - Fixed typos ("depreciation" → "deprecation") - Fixed "trainiing" → "training" 2. **Consistency in terminology**: - Standardized "BioNeMo" capitalization - Ensured consistent treatment of "ESM-2" references 3. **Structure and formatting**: - Improved spacing and paragraph breaks - Fixed section formatting and readability ### For the training-models documentation: 1. **Capitalization and consistency**: - Standardized capitalization of model sizes (8M, 650M, 3B) - Fixed capitalization of "ESM2", "Geneformer", "Python", "YAML" - Changed "WandB" to "Weights and Biases" consistently 2. **Formatting improvements**: - Changed code blocks consistently to include language tags - Added proper spacing and improved paragraph formatting - Fixed punctuation in lists and note sections 3. **Grammar and clarity**: - Added missing commas after introductory phrases - Fixed formatting of lists for better readability - Made bulleted explanations more consistent ### Type of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [x] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage  ```python TODO: Add code snippet ``` ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Timur Rvachov <[email protected]> Co-authored-by: lvojtku <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>

### Description  - Fixes the error caused by NVIDIA-NeMo/NeMo#12459 refactoring the definition of `masked_token_loss` and `masked_token_loss_context_parallel` into a single function with a `cp_size` argument that no longer divides the loss by the number of "valid" (i.e. non-masked) tokens. So it returns a CP-reduced loss sum. - Specifically, this breaks one of our golden value tests in `bionemo-llm`: `sub-packages/bionemo-llm/tests/bionemo/llm/model/test_loss.py::test_loss_equivalency_bionemo_vs_pytorch`, and this fixes it with no behavior change to the LLM model `forward()`, i.e. we perform the normalization on valid tokens on our side now. ### Details - Bump NeMo to a version greater than: NVIDIA-NeMo/NeMo#12856 or matching this: #798 - Update: Need to migrate to `inference_context` in NeMo: https://github.com/NVIDIA/NeMo/tree/cye/hyena-gpt-infer-context - Bump Megatron to support new imports in the NeMo bump. Found a commit that bisects the new Megatron inference engine and the new NeMo imports to prevent breakage of our inference tests. - Use a backend version of RoPE for the Amplify Megatron vs. PyTorch/HF parity test to avoid the CP process group requirement. - `MaskedTokenLossReduction.forward()` return API changed. - Added commentary for future devs to understand the code. #### Appendix - NeMo Fork Hotfix Patch: Safe import of a future module in Megatron to avoid upgrading. ``` get_gpt_heterogeneous_layer_spec, HAVE_GPT_HETEROGENEOUS = safe_import("megatron.core.models.gpt.heterogeneous.heterogeneous_layer_specs") ``` ### Type of changes  - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### Usage / Testing  - Tested against the commit specified in this PR: #798 ```python cd 3rdparty/NeMo git checkout c998e273f9cd23e36d7348fa27d0c2692efd87c8 pytest -s sub-packages/bionemo-llm/tests/bionemo/llm/model/test_loss.py::test_loss_equivalency_bionemo_vs_pytorch ``` --------- Signed-off-by: Farhad Ramezanghorbani <[email protected]> Signed-off-by: Cory Ye <[email protected]> Signed-off-by: cspades <[email protected]> Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Danny <[email protected]> Signed-off-by: Cory Ye <[email protected]> Signed-off-by: nvdreidenbach <[email protected]> Signed-off-by: Peter St. John <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Polina Binder <[email protected]> Signed-off-by: polinabinder1 <[email protected]> Signed-off-by: dorotat <[email protected]> Signed-off-by: Truong Nguyen <[email protected]> Signed-off-by: Jonathan Mitchell <[email protected]> Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Steven <[email protected]> Co-authored-by: Farhad Ramezanghorbani <[email protected]> Co-authored-by: Farhad Ramezanghorbani <[email protected]> Co-authored-by: Dorota Toczydlowska <[email protected]> Co-authored-by: Timur Rvachov <[email protected]> Co-authored-by: nvdreidenbach <[email protected]> Co-authored-by: Steven Kothen-Hill <[email protected]> Co-authored-by: Peter St. John <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: polinabinder1 <[email protected]> Co-authored-by: Truong Nguyen <[email protected]> Co-authored-by: jomitchellnv <[email protected]> Co-authored-by: lvojtku <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>

farhadrgh · 2025-05-05T16:13:47Z

/ok to test 917c9c4

Signed-off-by: Farhad Ramezanghorbani <[email protected]>

farhadrgh · 2025-05-05T16:51:38Z

/ok to test 550e852b3f148a0212bdd5a10a5909a8f947c32

copy-pr-bot · 2025-05-05T16:51:42Z

/ok to test 550e852b3f148a0212bdd5a10a5909a8f947c32

@farhadrgh, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

farhadrgh · 2025-05-05T16:52:16Z

/ok to test 1550e85

### Description NVIDIA-NeMo/NeMo#12856 introduces code reduction and perf improvements including standardizing input/output shapes for Hyena operators and consequentially reducing rearrangement overhead. This PR updates the EVO2 test to comply with those changes, ### Type of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage  ```python TODO: Add code snippet ``` ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: Farhad Ramezanghorbani <[email protected]> Signed-off-by: Cory Ye <[email protected]> Signed-off-by: cspades <[email protected]> Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Danny <[email protected]> Signed-off-by: Cory Ye <[email protected]> Signed-off-by: nvdreidenbach <[email protected]> Signed-off-by: Peter St. John <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Polina Binder <[email protected]> Signed-off-by: polinabinder1 <[email protected]> Signed-off-by: dorotat <[email protected]> Signed-off-by: Truong Nguyen <[email protected]> Signed-off-by: Jonathan Mitchell <[email protected]> Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Steven <[email protected]> Co-authored-by: Dorota Toczydlowska <[email protected]> Co-authored-by: Cory Ye <[email protected]> Co-authored-by: Timur Rvachov <[email protected]> Co-authored-by: nvdreidenbach <[email protected]> Co-authored-by: Steven Kothen-Hill <[email protected]> Co-authored-by: Peter St. John <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: polinabinder1 <[email protected]> Co-authored-by: Truong Nguyen <[email protected]> Co-authored-by: jomitchellnv <[email protected]> Co-authored-by: lvojtku <[email protected]> Signed-off-by: dorotat <[email protected]>

…d inference. (#861) ### Description  - #798 (#855) depends on a NeMo branch, which has been merged into NeMo `main`: NVIDIA-NeMo/NeMo#13436. Update to point to this trunk commit. ### Details - NeMo ToT reverted the `cp_size` argument for `masked_token_loss` (NVIDIA-NeMo/NeMo#13295), so we do the CP reduction on our side now... - Future Megatron bump will add the `* cp_size` multiplier to the loss, and break our inference unit tests due to `torch.inference_mode()` usage in Megatron. --------- Signed-off-by: Cory Ye <[email protected]>

### Description NVIDIA-NeMo/NeMo#12856 introduces code reduction and perf improvements including standardizing input/output shapes for Hyena operators and consequentially reducing rearrangement overhead. This PR updates the EVO2 test to comply with those changes, ### Type of changes  - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. * If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) * If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage  ```python TODO: Add code snippet ``` ### Pre-submit Checklist  - [ ] I have tested these changes locally - [ ] I have updated the documentation accordingly - [ ] I have added/updated tests as needed - [ ] All existing tests pass successfully --------- Signed-off-by: Farhad Ramezanghorbani <[email protected]> Signed-off-by: Cory Ye <[email protected]> Signed-off-by: cspades <[email protected]> Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Danny <[email protected]> Signed-off-by: Cory Ye <[email protected]> Signed-off-by: nvdreidenbach <[email protected]> Signed-off-by: Peter St. John <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Polina Binder <[email protected]> Signed-off-by: polinabinder1 <[email protected]> Signed-off-by: dorotat <[email protected]> Signed-off-by: Truong Nguyen <[email protected]> Signed-off-by: Jonathan Mitchell <[email protected]> Signed-off-by: Timur Rvachov <[email protected]> Signed-off-by: Steven <[email protected]> Co-authored-by: Dorota Toczydlowska <[email protected]> Co-authored-by: Cory Ye <[email protected]> Co-authored-by: Timur Rvachov <[email protected]> Co-authored-by: nvdreidenbach <[email protected]> Co-authored-by: Steven Kothen-Hill <[email protected]> Co-authored-by: Peter St. John <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: polinabinder1 <[email protected]> Co-authored-by: Truong Nguyen <[email protected]> Co-authored-by: jomitchellnv <[email protected]> Co-authored-by: lvojtku <[email protected]> Signed-off-by: Ubuntu <[email protected]>

…d inference. (#861) ### Description  - #798 (#855) depends on a NeMo branch, which has been merged into NeMo `main`: NVIDIA-NeMo/NeMo#13436. Update to point to this trunk commit. ### Details - NeMo ToT reverted the `cp_size` argument for `masked_token_loss` (NVIDIA-NeMo/NeMo#13295), so we do the CP reduction on our side now... - Future Megatron bump will add the `* cp_size` multiplier to the loss, and break our inference unit tests due to `torch.inference_mode()` usage in Megatron. --------- Signed-off-by: Cory Ye <[email protected]> Signed-off-by: Ubuntu <[email protected]>

farhadrgh requested review from cspades, dorotat-nv, jomitchellnv, jstjohn, jwilber, malcolmgreaves, pstjohn, sichu2023, skothenhill-nv and trvachov as code owners April 2, 2025 18:33

farhadrgh force-pushed the farhadr/evo2_cleanup branch from fca02bc to 58706fe Compare April 2, 2025 19:20

farhadrgh force-pushed the farhadr/evo2_cleanup branch from 4c5ac7d to c14f433 Compare April 2, 2025 21:06

cspades approved these changes Apr 9, 2025

View reviewed changes

jstjohn approved these changes Apr 9, 2025

View reviewed changes

sub-packages/bionemo-evo2/tests/bionemo/evo2/test_hyena_operators.py Show resolved Hide resolved

farhadrgh enabled auto-merge April 14, 2025 16:23

farhadrgh added this pull request to the merge queue Apr 16, 2025

pstjohn removed this pull request from the merge queue due to a manual request Apr 16, 2025

pstjohn and others added 8 commits May 5, 2025 09:12

Pbinder/auto resume (#766)

26684c5

PEFT checkpointing and inference for esm2. --------- Signed-off-by: Polina Binder <[email protected]> Signed-off-by: polinabinder1 <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>

Pbinder/esm2 document (#846)

7b4c9b9

### Description Profiling for LoRA additions to ESM2. --------- Signed-off-by: Polina Binder <[email protected]> Signed-off-by: Farhad Ramezanghorbani <[email protected]>

farhadrgh force-pushed the farhadr/evo2_cleanup branch from 271825c to 917c9c4 Compare May 5, 2025 16:12

farhadrgh requested review from DejunL, edawson, guoqing-zhou, nvdreidenbach, youhanl-nvidia and zcao0420 as code owners May 5, 2025 16:12

resolve conflicts

1550e85

Signed-off-by: Farhad Ramezanghorbani <[email protected]>

farhadrgh enabled auto-merge May 5, 2025 16:52

farhadrgh added this pull request to the merge queue May 5, 2025

Merged via the queue into main with commit 6ab0afc May 5, 2025
10 checks passed

farhadrgh deleted the farhadr/evo2_cleanup branch May 5, 2025 19:09

cspades mentioned this pull request May 7, 2025

Bump NeMo to use a trunk commit instead of a branch for Evo2 fixes and inference. #861

Merged

Update EVO2 tests according to Hyena arch changes #798

Update EVO2 tests according to Hyena arch changes #798

Uh oh!

Conversation

farhadrgh commented Apr 2, 2025

Description

Type of changes

CI Pipeline Configuration

Authorizing CI Runs

Usage

Pre-submit Checklist

Uh oh!

copy-pr-bot bot commented Apr 2, 2025

Uh oh!

farhadrgh commented Apr 2, 2025

Uh oh!

farhadrgh commented Apr 2, 2025

Uh oh!

cspades commented Apr 8, 2025

Uh oh!

farhadrgh commented Apr 9, 2025

Uh oh!

codecov-commenter commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jstjohn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

farhadrgh commented Apr 11, 2025

Uh oh!

farhadrgh commented Apr 14, 2025

Uh oh!

farhadrgh commented Apr 14, 2025

Uh oh!

farhadrgh commented Apr 14, 2025

Uh oh!

farhadrgh commented Apr 15, 2025

Uh oh!

farhadrgh commented Apr 16, 2025

Uh oh!

farhadrgh commented Apr 16, 2025

Uh oh!

Uh oh!

farhadrgh commented May 5, 2025

Uh oh!

farhadrgh commented May 5, 2025

Uh oh!

copy-pr-bot bot commented May 5, 2025

Uh oh!

farhadrgh commented May 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

codecov-commenter commented Apr 9, 2025 •

edited

Loading