Skip to content

Conversation

@trvachov
Copy link
Collaborator

@trvachov trvachov commented Sep 25, 2025

Description

Update pytorch, megatron, nemo, and test.

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels. By default, only basic unit tests are run.

  • ciflow:skip - Skip all CI tests for this PR
  • ciflow:notebooks - Run Jupyter notebooks execution tests for bionemo2
  • ciflow:slow - Run slow single GPU integration tests marked as @pytest.mark.slow for bionemo2
  • ciflow:all - Run all tests (unit tests, slow tests, and notebooks) for bionemo2. This label can be used to enforce running tests for all bionemo2.
  • ciflow:all-recipes - Run tests for all recipes (under bionemo-recipes). This label can be used to enforce running tests for all recipes.

Unit tests marked as @pytest.mark.multi_gpu or @pytest.mark.distributed are not run in the PR pipeline.

For more details, see CONTRIBUTING

Note

By default, only basic unit tests are run. Add appropriate labels to enable an additional test coverage.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

  • If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
    automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
  • If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
    /ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

Summary by CodeRabbit

  • New Changes

    • Removed the bionemo-geometric sub-package and its dependencies from the project.
  • Documentation

    • Removed references to the geometric sub-package from the developer guide.
  • Chores

    • Updated container base image and revised build/install steps for select dependencies to align with newer CUDA and tooling.
    • Updated security baseline data.
    • Adjusted code ownership mappings.
  • Tests

    • Removed test suites related to the geometric sub-package.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 25, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 25, 2025

Walkthrough

The PR removes the bionemo-geometric sub-package (code, tests, config, docs, and ownership), updates project configs to drop references, and revises the Dockerfile to new base image and build flows for causal-conv1d, Mamba, and bitsandbytes. It also refreshes the secret baseline timestamp and removes a legacy TE patch.

Changes

Cohort / File(s) Summary
Sub-package removal: bionemo-geometric
sub-packages/bionemo-geometric/... (LICENSE, README.md, VERSION, pyproject.toml, requirements.txt, src/bionemo/geometric/*, tests/bionemo/geometric/*)
Deletes the entire bionemo-geometric package: all source modules (atom/bond/molecule featurizers), init, configs, requirements, and tests; trims license/version references.
Config references cleanup
pyproject.toml, sub-packages/bionemo-fw/pyproject.toml, tach.toml, docs/docs/main/developer-guide/SUMMARY.md
Removes bionemo-geometric from dependencies, workspace/source_roots, tooling deps, and developer docs summary.
Ownership mapping
CODEOWNERS
Removes codeowner entry for sub-packages/bionemo-geometric.
Docker build updates
Dockerfile
Bumps base to PyTorch 25.08, drops TransformerEngine patching, switches causal-conv1d to local build from pinned commit, changes Mamba source/version/build process, adjusts bitsandbytes install for CUDA 13.0, removes pre-copy/install for geometric.
TransformerEngine patch artifact
docker_build_patches/te.patch
Deletes patch content (cast.cpp quantize binding) as TE patching is removed.
Secrets baseline refresh
.secrets.baseline
Updates generated_at timestamp and a recorded line number for pyproject.toml; no structural changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as Docker Build
  participant Img as Base Image (nvcr.io/nvidia/pytorch:25.08-py3)
  participant Git as Git Repos
  participant Pip as pip/uv
  participant Py as Python Build

  Dev->>Img: Start FROM 25.08-py3
  Note over Img: TE patching removed

  Dev->>Git: Clone causal-conv1d@<commit>
  Dev->>Py: python setup.py build_ext --inplace
  Dev->>Py: Build wheel
  Py-->>Pip: wheel file
  Dev->>Pip: pip install wheel (no deps)

  Dev->>Git: Clone trvachov/[email protected]
  Dev->>Py: python setup.py build_ext --inplace
  Dev->>Pip: pip install .

  Note over Pip: bitsandbytes: uninstall via uv, then pip install wheel (no-deps)

  Note over Dev: bionemo-geometric pre-install removed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I tidied my warren, light and sleek,
Packed up the geom* tricks this week.
New wheels spun fast, convs built tight,
Mambas hiss in CUDA night.
Patchless trails, a cleaner flight—
Thump-thump! my paws approve the site. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The title “[WIP] Update base dependencies” includes a work-in-progress marker and is overly generic, failing to clearly summarize the most significant change in the pull request, which involves not only dependency updates but also the removal of an entire sub-package and related code. Please remove the “[WIP]” prefix and replace it with a concise, descriptive title that highlights the primary changes, for example “Remove bionemo-geometric sub-package and update core dependencies.”
Description Check ⚠️ Warning The description includes a brief summary and the CI and checklist sections but omits the required Usage section with example code and lacks a detailed explanation of the specific changes made to dependencies, packaging, and removed modules. Please add the “#### Usage” section with a representative code snippet showing how to exercise the updated dependencies, and expand the Description to detail which dependencies were bumped, which sub-packages were removed, and any breaking changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch trvachov/dep-update-25.10

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@trvachov
Copy link
Collaborator Author

/ok to test f23c0f3

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
Dockerfile (5)

56-57: Fix invalid apt command: apt-get upgrade with package name is wrong

This will fail or be ignored. Use install with only-upgrade.

Apply:

-apt-get upgrade -qyy \
-  rsync
+apt-get install -qy --only-upgrade rsync

112-113: wget is used but never installed (ARM path will fail)

Either install wget here or use curl which is already present.

Example fix (using curl):

-    wget https://github.com/TileDB-Inc/TileDB/releases/download/2.27.2/tiledb-linux-arm64-2.27.2-1757013.tar.gz -O tiledb.tar.gz && \
+    curl -sSL https://github.com/TileDB-Inc/TileDB/releases/download/2.27.2/tiledb-linux-arm64-2.27.2-1757013.tar.gz -o tiledb.tar.gz && \

115-116: Potential build break: xargs may run apt-get remove with no args

If no matching packages exist, xargs can invoke apt-get remove -y with no args and error. Use xargs -r (GNU) or guard the call.

Apply:

-    dpkg -l | awk '/libfmt/ {print $2}' | xargs apt-get remove -y && \
-    dpkg -l | awk '/spdlog/ {print $2}' | xargs apt-get remove -y && \
+    dpkg -l | awk '/libfmt/ {print $2}' | xargs -r apt-get remove -y && \
+    dpkg -l | awk '/spdlog/ {print $2}' | xargs -r apt-get remove -y && \

If non-GNU xargs is possible, replace with conditional checks.


217-225: uv pip uninstall lacks -y and can hang in non-interactive builds

These uninstalls will prompt for confirmation and stall the build.

Apply:

-uv pip uninstall bitsandbytes && uv pip install bitsandbytes==0.46.1
+uv pip uninstall -y bitsandbytes || true
+uv pip install bitsandbytes==0.46.1
@@
-uv pip uninstall sqlitedict zstandard
+uv pip uninstall -y sqlitedict zstandard || true

39-59: Make set -o pipefail portable by switching shell to bash for RUNs

Several heredoc RUNs use set -eo pipefail; /bin/sh (dash) doesn’t support pipefail. Use bash shell per stage.

Apply after each relevant FROM:

 FROM ${BASE_IMAGE} AS bionemo2-base
+SHELL ["/bin/bash", "-lc"]
@@
 FROM ${BASE_IMAGE} AS dev
+SHELL ["/bin/bash", "-lc"]
@@
 FROM dev AS development
+SHELL ["/bin/bash", "-lc"]
@@
 FROM bionemo2-base AS release
+SHELL ["/bin/bash", "-lc"]
🧹 Nitpick comments (3)
Dockerfile (3)

200-202: Pin nvidia-resiliency-ext to a commit/tag for reproducible builds

Unpinned clones introduce nondeterminism.

Example:

-git clone https://github.com/NVIDIA/nvidia-resiliency-ext
+git clone --depth=1 --branch <tag-or-commit> https://github.com/NVIDIA/nvidia-resiliency-ext

Replace <tag-or-commit> with a known good ref.


119-122: Clean up TileDB-SOMA repo after install to keep image small

The cloned repo persists.

Apply:

-    cd TileDB-SOMA/apis/python && \
-    pip install .; \
+    cd TileDB-SOMA/apis/python && \
+    pip install . && \
+    cd / && rm -rf TileDB-SOMA; \

15-21: Update stale comment about Transformer Engine

The comment states “with apex and transformer engine,” but TE patch/integration was removed here.

Refresh the comment to reflect the current contents.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f23c0f3 and 5b79da9.

📒 Files selected for processing (2)
  • Dockerfile (1 hunks)
  • docker_build_patches/te.patch (0 hunks)
💤 Files with no reviewable changes (1)
  • docker_build_patches/te.patch
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (4)
Dockerfile (4)

167-168: Confirm safe removal of bundled ONNX sources

Removing /opt/pytorch/pytorch/third_party/onnx may affect torch.onnx export or tooling expecting these sources.

Please confirm no ONNX export paths in tests/pipelines rely on this directory.


252-266: Verify Python path hardcode (3.12) matches base image

If base moves to a different Python minor, these paths break.

Consider deriving with python -c 'import sys;import sysconfig;print(sysconfig.get_paths()[\"purelib\"])' during build, or keep in sync with BASE_IMAGE updates.


24-33: Rust toolchain layering is inconsistent; validate image tag and need

Base stage uses rust:1.86.0 but sets default to 1.82.0. Ensure rust:1.86.0 exists and this split is intended.

If only 1.82.0 is required, switch base to rust:1.82 to reduce size/variance.


108-122: TILEDB_PATH export is ephemeral; ensure build finds libs

export TILEDB_PATH=... only applies to this RUN. Confirm the SOMA build locates TileDB without needing a persistent env var.

If required, set ENV TILEDB_PATH=/usr/lib/tiledb before the build or pass it inline to the pip build step.

@trvachov
Copy link
Collaborator Author

/ok to test 5b79da9

@codecov-commenter
Copy link

codecov-commenter commented Sep 26, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.76%. Comparing base (dd4f626) to head (c00ca7e).
⚠️ Report is 41 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1194      +/-   ##
==========================================
- Coverage   79.93%   79.76%   -0.18%     
==========================================
  Files         160      156       -4     
  Lines       11858    11617     -241     
==========================================
- Hits         9479     9266     -213     
+ Misses       2379     2351      -28     

see 1 file with indirect coverage changes

@trvachov trvachov force-pushed the trvachov/dep-update-25.10 branch from 803412c to c245759 Compare September 28, 2025 17:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 803412c and c245759.

📒 Files selected for processing (1)
  • Dockerfile (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (rust)

@trvachov trvachov force-pushed the trvachov/dep-update-25.10 branch 2 times, most recently from c245759 to 6534410 Compare September 28, 2025 19:35
@trvachov trvachov force-pushed the trvachov/dep-update-25.10 branch 2 times, most recently from 209ad56 to bc49d1e Compare October 1, 2025 03:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
Dockerfile (1)

226-229: Gate the bitsandbytes install by architecture.

This always pulls the manylinux_2_24_x86_64 wheel, so an ARM64 build (TARGETARCH=arm64) explodes while unpacking the x86_64 artifact. Restore an architecture check and keep/restore the ARM path (e.g., skip or install an ARM-capable build).

-uv pip uninstall bitsandbytes && pip install --no-deps --no-build-isolation https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl
+if [ "$TARGETARCH" = "amd64" ] || [ "$TARGETARCH" = "x86_64" ]; then
+  uv pip uninstall bitsandbytes
+  pip install --no-deps --no-build-isolation \
+    https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl
+elif [ "$TARGETARCH" = "arm64" ]; then
+  uv pip uninstall bitsandbytes || true
+  uv pip install bitsandbytes==0.45.2  # or another ARM-compatible build
+fi
🧹 Nitpick comments (1)
Dockerfile (1)

83-89: Fix the repo cleanup path.

The clone lives in causal-conv1d, but the cleanup step removes causal_conv1d, so the source tree is left behind in the image. Please delete the actual directory you cloned.

-rm -rf causal_conv1d
+rm -rf causal-conv1d
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 209ad56 and bc49d1e.

📒 Files selected for processing (1)
  • Dockerfile (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (rust)

@trvachov trvachov added the ciflow:all Run all tests (unit tests, slow tests, and notebooks) for bionemo2 or enforce running all tests label Oct 1, 2025
@trvachov
Copy link
Collaborator Author

trvachov commented Oct 1, 2025

/ok to test bc49d1e

@trvachov trvachov force-pushed the trvachov/dep-update-25.10 branch from bc49d1e to c00ca7e Compare October 14, 2025 16:35
@trvachov
Copy link
Collaborator Author

/ok to test c00ca7e

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
Dockerfile (1)

83-90: Fix cleanup path after causal-conv1d install

We clone causal-conv1d, but the cleanup removes causal_conv1d, so the repo stays in the image. Switch the rm -rf target back to causal-conv1d to keep the layer tidy.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc49d1e and c00ca7e.

📒 Files selected for processing (1)
  • Dockerfile (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (rust)

Comment on lines +226 to +229
# CUDA 13.0 supported bitsandbytes requires a nightly build, which doesn't seem to work
# with uv because of some trivial package/version name check, so we use uv for uninstall
# and then pip for reinstall
uv pip uninstall bitsandbytes && pip install --no-deps --no-build-isolation https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Restore ARM build compatibility for bitsandbytes install

pip install … manylinux_2_24_x86_64.whl runs unconditionally. On TARGETARCH=arm64 builds the wheel is rejected as incompatible, so the Docker build aborts and we lose our ARM images. Wrap this block in an if [ "$TARGETARCH" = "amd64" ] … fi (with an appropriate else branch—either skip, or keep the previous ARM flow) so non-x86 builds don’t try to consume the x86_64 artifact.

🤖 Prompt for AI Agents
In Dockerfile around lines 226 to 229, the pip install of the x86_64
bitsandbytes wheel runs unconditionally and fails on TARGETARCH=arm64; wrap this
block in a shell conditional that checks if "$TARGETARCH" = "amd64" and only
runs the uv uninstall and pip install for that case, and add an else branch that
either skips the x86 wheel (no-op) or performs the previous ARM-compatible
install flow so ARM builds don't attempt to install the incompatible
manylinux_2_24_x86_64.whl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow:all Run all tests (unit tests, slow tests, and notebooks) for bionemo2 or enforce running all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants