Skip to content

wheels: build with CUDA 13.0, test against mix of CTK versions, make 'torch-geometric' fully optional for 'cugraph-pyg'#434

Open
jameslamb wants to merge 8 commits intorapidsai:release/26.04from
jameslamb:test-older-ctk
Open

wheels: build with CUDA 13.0, test against mix of CTK versions, make 'torch-geometric' fully optional for 'cugraph-pyg'#434
jameslamb wants to merge 8 commits intorapidsai:release/26.04from
jameslamb:test-older-ctk

Conversation

@jameslamb
Copy link
Member

@jameslamb jameslamb commented Mar 18, 2026

Description

Fixes #410

Contributes to rapidsai/build-planning#257

  • builds CUDA 13 wheels with the 13.0 CTK

Contributes to rapidsai/build-planning#256

  • updates wheel tests to cover a range of CTK versions (we previously, accidentally, were only testing the latest 12.x and 13.x)

Makes torch even more optional for wheels (follow-up to #425)

  • removes torch-geometric from cugraph-pyg wheels' runtime dependencies (leaves it for conda)
  • removes ogb and sentence-transformers from cugraph-pyg[test] (they're only used for examples that aren't run in wheels CI)

Notes for Reviewers

How I tested this

Tested the full set of nightly and PR CI jobs for wheels, saw them all pass: https://github.com/rapidsai/cugraph-gnn/actions/runs/23348691254

This should fix #410 😁

@jameslamb jameslamb requested review from a team as code owners March 18, 2026 20:44
@jameslamb jameslamb requested a review from bdice March 18, 2026 20:44
@jameslamb jameslamb added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 18, 2026
@greptile-apps

This comment was marked as resolved.

- wheel-build-libwholegraph
- wheel-build-pylibwholegraph
- wheel-tests-pylibwholegraph
- wheel-tests-nightly-pylibwholegraph
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO before merging: remove all this

We cover a different mix of environments in nightlies and this project's dependency tree is very sensitive to that mix, so want to be sure we've accounted for everything.

- wheel-tests-nightly-pylibwholegraph
- wheel-build-cugraph-pyg
- wheel-tests-cugraph-pyg
- wheel-tests-nightly-cugraph-pyg
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty close!

  • ✔️ all PR CI wheel jobs passing
  • ✔️ all nightly pylibwholegraph wheel jobs passing
  • 😬 1 nightly cugraph-pyg wheel job failing
Collecting ucxx-cu12==0.49.*,>=0.0.0a0 (from cugraph-cu12==26.4.*,>=0.0.0a0->cugraph-pyg-cu12==26.4.0a40->cugraph-pyg-cu12==26.4.0a40)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a32/ucxx_cu12-0.49.0a32-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (503 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a31/ucxx_cu12-0.49.0a31-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (503 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a30/ucxx_cu12-0.49.0a30-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (503 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a29/ucxx_cu12-0.49.0a29-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (503 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a28/ucxx_cu12-0.49.0a28-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (515 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a27/ucxx_cu12-0.49.0a27-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (515 kB)
Collecting torch-geometric<2.8,>=2.5 (from cugraph-pyg-cu12==26.4.0a40->cugraph-pyg-cu12==26.4.0a40)
  Downloading http://pip-cache.local.gha-runners.nvidia.com/packages/03/9f/157e913626c1acfb3b19ce000b1a6e4e4fb177c0bc0ea0c67ca5bd714b5a/torch_geometric-2.6.1-py3-none-any.whl.metadata (63 kB)
error: resolution-too-deep

× Dependency resolution exceeded maximum depth
╰─> Pip cannot resolve the current dependencies as the dependency graph is too complex for pip to solve efficiently.

hint: Try adding lower bounds to constrain your dependencies, for example: 'package>=2.0.0' instead of just 'package'.

(wheel-tests-nightly-cugraph-pyg / 12.2.2, 3.11, amd64, ubuntu22.04, v100, earliest-driver, latest-deps)

I'll try to reproduce that locally and see if I can get a better solver error.

Copy link
Member Author

@jameslamb jameslamb Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm able to reproduce this locally

code to do that (click me)
docker run \
    --rm \
    --pull always \
    --env RAPIDS_REPOSITORY=rapidsai/cugraph-gnn \
    --env RAPIDS_SHA=13ef184fcfbeab41e096fa643f1ff082a3127ccd \
    --env RAPIDS_REF_NAME=pull-request/434 \
    --env RAPIDS_BUILD_TYPE=pull-request \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/citestwheel:26.04-cuda12.2.2-ubuntu22.04-py3.11 \
    bash


source rapids-init-pip

package_name="cugraph-pyg"

RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"

# Download the libwholegraph, pylibwholegraph, and cugraph-pyg built in the previous step
LIBWHOLEGRAPH_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="libwholegraph_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-github cpp)
PYLIBWHOLEGRAPH_WHEELHOUSE=$(rapids-download-from-github "$(rapids-package-name "wheel_python" pylibwholegraph --stable --cuda "$RAPIDS_CUDA_VERSION")")
CUGRAPH_PYG_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="${package_name}_${RAPIDS_PY_CUDA_SUFFIX}" RAPIDS_PY_WHEEL_PURE="1" rapids-download-wheels-from-github python)

# generate constraints (possibly pinning to oldest support versions of dependencies)
rapids-generate-pip-constraints test_cugraph_pyg "${PIP_CONSTRAINT}"

rapids-generate-pip-constraints torch_only "${PIP_CONSTRAINT}"

rapids-pip-retry install \
  --prefer-binary \
  --constraint "${PIP_CONSTRAINT}" \
  --extra-index-url 'https://pypi.nvidia.com' \
  "${LIBWHOLEGRAPH_WHEELHOUSE}"/*.whl \
  "$(echo "${PYLIBWHOLEGRAPH_WHEELHOUSE}"/pylibwholegraph_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)" \
  "$(echo "${CUGRAPH_PYG_WHEELHOUSE}"/cugraph_pyg_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)[test]"

I think I see what's happening.

  • torch-geometric and ogb require torch
  • ogb requires some nvidia-{project} CTK packages like nvidia-cuda-nvrtc
  • when we don't install a CUDA build of torch, the version of torch in the environment is only constrained by ogb and torch-geometric's requirements, which allow all the way back to torch>=1.6.0

Taken together, you end up in this "resolution-too-deep" situation, where pip is trying varying combinations of ogb, torch-geometric, and CPU-only torch. CUDA-suffixed packages make the resolution graph larger... go back far enough and ogb flips from depending on nvidia-cuda-nvrtc-cu12 to nvidia-cuda-nvrtc-cu11`.

Unfortunately I think the best long-term fix here is to treat ogb and torch-geometric as fully optional for wheels just as we do torch... keeping them out of wheel metadata and installing them separately (ref: #425). If torch has to be truly optional, then anything that pulls it in needs to be optional too. I'll work on that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying this in 9c60899

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, still hitting a 'resolution-to-deep' error even without 'torch', 'ogb', or 'torch-geometric' in the solve: https://github.com/rapidsai/cugraph-gnn/actions/runs/23318288336/job/67824788071?pr=434

Will look more into this tomorrow. Maybe it's actually RAPIDS libraries that are causing the conflicts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is with pip right?

If so, maybe it is worth trying with uv. That might give us more insight into the nature of the conflict

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll consider it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the root cause... cugraph-pyg[test] had sentence-transformers in it, which pulls in torch as a required dependency. That took us back down the road of pip considering many different torch versions and other libraries with competing dependencies (including building some from source during backtracking!), which led to these issues.

We really do not want torch in the environment at all unless it's a CUDA build of torch, and that means making sentence-transformers optional just as we did with torch itself in #425.

Pushed that change and it looks like all CI Jobs (including all nightly wheels jobs!) are now passing: https://github.com/rapidsai/cugraph-gnn/actions/runs/23348691254/job/67923786458?pr=434

I'll revert the nightly stuff and go ask for a review.

@jameslamb jameslamb changed the title build wheels with CUDA 13.0.x, test wheels against mix of CTK versions wheels: build with CUDA 13.0, test against mix of CTK versions, make 'torch-geometric' fully optional for 'cugraph-pyg' Mar 19, 2026
@jameslamb jameslamb mentioned this pull request Mar 20, 2026
- matrix:
packages:
- sentence-transformers
- sentence-transformers>=3.0.1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran into issues in this PR that were like "pip is processing a graph of possibilities that's too large".

I don't think this floor would have helped that (in this specific case, the entire dependency just needed to be skipped), but in general having floors for test-only requirements like this reduces the risk of this type of problem.

This choice is pretty arbitrary... sentence-transformers 3.0.0 came out about 2 years ago (May 2024) and 3.0.1 came out a few days later so probably fixed some bug(s).

Chose this just to go from "no floor" to "some floor", and "version from 2 years ago" seemed like a safe choice 🤷🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants