Skip to content

Conversation

@cspades
Copy link
Member

@cspades cspades commented May 7, 2025

Description

Details

  • NeMo ToT reverted the cp_size argument for masked_token_loss (Fix loss compute and reduction NVIDIA-NeMo/NeMo#13295), so we do the CP reduction on our side now...
    • Future Megatron bump will add the * cp_size multiplier to the loss, and break our inference unit tests due to torch.inference_mode() usage in Megatron.

Copy link
Collaborator

@jstjohn jstjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship

@codecov-commenter
Copy link

codecov-commenter commented May 7, 2025

Codecov Report

❌ Patch coverage is 92.85714% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 84.41%. Comparing base (635a8f4) to head (2db0330).
⚠️ Report is 106 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...onemo/geneformer/model/finetune_token_regressor.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #861      +/-   ##
==========================================
+ Coverage   84.38%   84.41%   +0.03%     
==========================================
  Files         138      138              
  Lines        8691     8695       +4     
==========================================
+ Hits         7334     7340       +6     
+ Misses       1357     1355       -2     
Files with missing lines Coverage Δ
...kages/bionemo-evo2/src/bionemo/evo2/run/predict.py 79.72% <100.00%> (+0.13%) ⬆️
...packages/bionemo-llm/src/bionemo/llm/model/loss.py 59.49% <100.00%> (-0.51%) ⬇️
...ionemo-llm/src/bionemo/llm/utils/megatron_utils.py 100.00% <100.00%> (ø)
...onemo/geneformer/model/finetune_token_regressor.py 60.97% <75.00%> (+0.49%) ⬆️

... and 1 file with indirect coverage changes

@cspades cspades force-pushed the cye/nemo-bump-hyena-infer branch 5 times, most recently from 0fcd3a9 to 91fb3b5 Compare May 7, 2025 21:49
@cspades cspades enabled auto-merge May 7, 2025 22:36
@cspades cspades force-pushed the cye/nemo-bump-hyena-infer branch from 91fb3b5 to 2db0330 Compare May 7, 2025 22:49
@cspades cspades added this pull request to the merge queue May 7, 2025
Merged via the queue into main with commit ac004e8 May 8, 2025
10 checks passed
@cspades cspades deleted the cye/nemo-bump-hyena-infer branch May 8, 2025 01:10
camirr-nv pushed a commit that referenced this pull request Jun 26, 2025
…d inference. (#861)

### Description
<!-- Provide a detailed description of the changes in this PR -->

- #798
(#855) depends on a NeMo
branch, which has been merged into NeMo `main`:
NVIDIA-NeMo/NeMo#13436. Update to point to this trunk
commit.

### Details

- NeMo ToT reverted the `cp_size` argument for `masked_token_loss`
(NVIDIA-NeMo/NeMo#13295), so we do the CP reduction
on our side now...
- Future Megatron bump will add the `* cp_size` multiplier to the loss,
and break our inference unit tests due to `torch.inference_mode()` usage
in Megatron.

---------

Signed-off-by: Cory Ye <[email protected]>
Signed-off-by: Ubuntu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants