Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,10 @@ jobs:
- wheel-build-libwholegraph
- wheel-build-pylibwholegraph
- wheel-tests-pylibwholegraph
- wheel-tests-nightly-pylibwholegraph
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO before merging: remove all this

We cover a different mix of environments in nightlies and this project's dependency tree is very sensitive to that mix, so want to be sure we've accounted for everything.

- wheel-build-cugraph-pyg
- wheel-tests-cugraph-pyg
- wheel-tests-nightly-cugraph-pyg
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty close!

  • ✔️ all PR CI wheel jobs passing
  • ✔️ all nightly pylibwholegraph wheel jobs passing
  • 😬 1 nightly cugraph-pyg wheel job failing
Collecting ucxx-cu12==0.49.*,>=0.0.0a0 (from cugraph-cu12==26.4.*,>=0.0.0a0->cugraph-pyg-cu12==26.4.0a40->cugraph-pyg-cu12==26.4.0a40)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a32/ucxx_cu12-0.49.0a32-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (503 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a31/ucxx_cu12-0.49.0a31-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (503 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a30/ucxx_cu12-0.49.0a30-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (503 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a29/ucxx_cu12-0.49.0a29-cp311-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (503 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a28/ucxx_cu12-0.49.0a28-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (515 kB)
  Downloading https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ucxx-cu12/0.49.0a27/ucxx_cu12-0.49.0a27-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (515 kB)
Collecting torch-geometric<2.8,>=2.5 (from cugraph-pyg-cu12==26.4.0a40->cugraph-pyg-cu12==26.4.0a40)
  Downloading http://pip-cache.local.gha-runners.nvidia.com/packages/03/9f/157e913626c1acfb3b19ce000b1a6e4e4fb177c0bc0ea0c67ca5bd714b5a/torch_geometric-2.6.1-py3-none-any.whl.metadata (63 kB)
error: resolution-too-deep

× Dependency resolution exceeded maximum depth
╰─> Pip cannot resolve the current dependencies as the dependency graph is too complex for pip to solve efficiently.

hint: Try adding lower bounds to constrain your dependencies, for example: 'package>=2.0.0' instead of just 'package'.

(wheel-tests-nightly-cugraph-pyg / 12.2.2, 3.11, amd64, ubuntu22.04, v100, earliest-driver, latest-deps)

I'll try to reproduce that locally and see if I can get a better solver error.

Copy link
Member Author

@jameslamb jameslamb Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm able to reproduce this locally

code to do that (click me)
docker run \
    --rm \
    --pull always \
    --env RAPIDS_REPOSITORY=rapidsai/cugraph-gnn \
    --env RAPIDS_SHA=13ef184fcfbeab41e096fa643f1ff082a3127ccd \
    --env RAPIDS_REF_NAME=pull-request/434 \
    --env RAPIDS_BUILD_TYPE=pull-request \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/citestwheel:26.04-cuda12.2.2-ubuntu22.04-py3.11 \
    bash


source rapids-init-pip

package_name="cugraph-pyg"

RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"

# Download the libwholegraph, pylibwholegraph, and cugraph-pyg built in the previous step
LIBWHOLEGRAPH_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="libwholegraph_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-github cpp)
PYLIBWHOLEGRAPH_WHEELHOUSE=$(rapids-download-from-github "$(rapids-package-name "wheel_python" pylibwholegraph --stable --cuda "$RAPIDS_CUDA_VERSION")")
CUGRAPH_PYG_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="${package_name}_${RAPIDS_PY_CUDA_SUFFIX}" RAPIDS_PY_WHEEL_PURE="1" rapids-download-wheels-from-github python)

# generate constraints (possibly pinning to oldest support versions of dependencies)
rapids-generate-pip-constraints test_cugraph_pyg "${PIP_CONSTRAINT}"

rapids-generate-pip-constraints torch_only "${PIP_CONSTRAINT}"

rapids-pip-retry install \
  --prefer-binary \
  --constraint "${PIP_CONSTRAINT}" \
  --extra-index-url 'https://pypi.nvidia.com' \
  "${LIBWHOLEGRAPH_WHEELHOUSE}"/*.whl \
  "$(echo "${PYLIBWHOLEGRAPH_WHEELHOUSE}"/pylibwholegraph_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)" \
  "$(echo "${CUGRAPH_PYG_WHEELHOUSE}"/cugraph_pyg_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)[test]"

I think I see what's happening.

  • torch-geometric and ogb require torch
  • ogb requires some nvidia-{project} CTK packages like nvidia-cuda-nvrtc
  • when we don't install a CUDA build of torch, the version of torch in the environment is only constrained by ogb and torch-geometric's requirements, which allow all the way back to torch>=1.6.0

Taken together, you end up in this "resolution-too-deep" situation, where pip is trying varying combinations of ogb, torch-geometric, and CPU-only torch. CUDA-suffixed packages make the resolution graph larger... go back far enough and ogb flips from depending on nvidia-cuda-nvrtc-cu12 to nvidia-cuda-nvrtc-cu11`.

Unfortunately I think the best long-term fix here is to treat ogb and torch-geometric as fully optional for wheels just as we do torch... keeping them out of wheel metadata and installing them separately (ref: #425). If torch has to be truly optional, then anything that pulls it in needs to be optional too. I'll work on that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying this in 9c60899

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, still hitting a 'resolution-to-deep' error even without 'torch', 'ogb', or 'torch-geometric' in the solve: https://github.com/rapidsai/cugraph-gnn/actions/runs/23318288336/job/67824788071?pr=434

Will look more into this tomorrow. Maybe it's actually RAPIDS libraries that are causing the conflicts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is with pip right?

If so, maybe it is worth trying with uv. That might give us more insight into the nature of the conflict

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll consider it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the root cause... cugraph-pyg[test] had sentence-transformers in it, which pulls in torch as a required dependency. That took us back down the road of pip considering many different torch versions and other libraries with competing dependencies (including building some from source during backtracking!), which led to these issues.

We really do not want torch in the environment at all unless it's a CUDA build of torch, and that means making sentence-transformers optional just as we did with torch itself in #425.

Pushed that change and it looks like all CI Jobs (including all nightly wheels jobs!) are now passing: https://github.com/rapidsai/cugraph-gnn/actions/runs/23348691254/job/67923786458?pr=434

I'll revert the nightly stuff and go ask for a review.

secrets: inherit
uses: rapidsai/shared-workflows/.github/workflows/pr-builder.yaml@release/26.04
if: always()
Expand Down Expand Up @@ -259,6 +261,16 @@ jobs:
build_type: pull-request
script: ci/test_wheel_pylibwholegraph.sh
matrix_filter: map(select(.PY_VER != "3.14")) | map(select(.ARCH == "amd64"))
wheel-tests-nightly-pylibwholegraph:
needs: [wheel-build-pylibwholegraph, changed-files]
secrets: inherit
uses: rapidsai/shared-workflows/.github/workflows/wheels-test.yaml@release/26.04
if: fromJSON(needs.changed-files.outputs.changed_file_groups).test_python_wheels
with:
build_type: pull-request
script: ci/test_wheel_pylibwholegraph.sh
matrix_filter: map(select(.PY_VER != "3.14")) | map(select(.ARCH == "amd64"))
matrix_type: nightly
wheel-build-cugraph-pyg:
needs: checks
secrets: inherit
Expand All @@ -281,3 +293,13 @@ jobs:
build_type: pull-request
script: ci/test_wheel_cugraph-pyg.sh
matrix_filter: map(select(.PY_VER != "3.14")) | map(select(.ARCH == "amd64"))
wheel-tests-nightly-cugraph-pyg:
needs: [wheel-build-pylibwholegraph, wheel-build-cugraph-pyg, changed-files]
secrets: inherit
uses: rapidsai/shared-workflows/.github/workflows/wheels-test.yaml@release/26.04
if: fromJSON(needs.changed-files.outputs.changed_file_groups).test_python_wheels
with:
build_type: pull-request
script: ci/test_wheel_cugraph-pyg.sh
matrix_filter: map(select(.PY_VER != "3.14")) | map(select(.ARCH == "amd64"))
matrix_type: nightly
14 changes: 0 additions & 14 deletions ci/test_wheel_cugraph-pyg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -61,20 +61,6 @@ popd
export TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1

if [[ "${torch_downloaded}" == "true" ]]; then
# TODO: remove this when RAPIDS wheels and 'torch' CUDA wheels have compatible package requirements
#
# * https://github.com/rapidsai/cugraph/issues/5443
# * https://github.com/rapidsai/build-planning/issues/257
# * https://github.com/rapidsai/build-planning/issues/255
#
CUDA_MAJOR="${RAPIDS_CUDA_VERSION%%.*}"
CUDA_MINOR=$(echo "${RAPIDS_CUDA_VERSION}" | cut -d'.' -f2)
if [[ "${CUDA_MAJOR}" == "13" ]]; then
pip install \
--upgrade \
"nvidia-nvjitlink>=${CUDA_MAJOR}.${CUDA_MINOR}"
fi

# 'torch' is an optional dependency of 'cugraph-pyg'... confirm that it's actually
# installed here and that we've installed a package with CUDA support.
rapids-logger "Confirming that PyTorch is installed"
Expand Down
14 changes: 0 additions & 14 deletions ci/test_wheel_pylibwholegraph.sh
Original file line number Diff line number Diff line change
Expand Up @@ -51,20 +51,6 @@ rapids-pip-retry install \


if [[ "${torch_downloaded}" == "true" ]]; then
# TODO: remove this when RAPIDS wheels and 'torch' CUDA wheels have compatible package requirements
#
# * https://github.com/rapidsai/cugraph/issues/5443
# * https://github.com/rapidsai/build-planning/issues/257
# * https://github.com/rapidsai/build-planning/issues/255
#
CUDA_MAJOR="${RAPIDS_CUDA_VERSION%%.*}"
CUDA_MINOR=$(echo "${RAPIDS_CUDA_VERSION}" | cut -d'.' -f2)
if [[ "${CUDA_MAJOR}" == "13" ]]; then
pip install \
--upgrade \
"nvidia-nvjitlink>=${CUDA_MAJOR}.${CUDA_MINOR}"
fi

# 'torch' is an optional dependency of 'pylibwholegraph'... confirm that it's actually
# installed here and that we've installed a package with CUDA support.
rapids-logger "Confirming that PyTorch is installed"
Expand Down
55 changes: 47 additions & 8 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,8 @@ dependencies:
- output_types: [conda, requirements]
packages:
- pre-commit
# 'cuda_version' intentionally does not contain fallback entries... we want
# a loud error if an unsupported 'cuda' value is passed
cuda_version:
specific:
- output_types: conda
Expand All @@ -201,18 +203,10 @@ dependencies:
cuda: "12.2"
packages:
- cuda-version=12.2
- matrix:
cuda: "12.4"
packages:
- cuda-version=12.4
- matrix:
cuda: "12.5"
packages:
- cuda-version=12.5
- matrix:
cuda: "12.6"
packages:
- cuda-version=12.6
- matrix:
cuda: "12.8"
packages:
Expand All @@ -229,6 +223,51 @@ dependencies:
cuda: "13.1"
packages:
- cuda-version=13.1
- output_types: requirements
matrices:
# if use_cuda_wheels=false is provided, do not add dependencies on any CUDA wheels
# (e.g. for DLFW and pip devcontainers)
- matrix:
use_cuda_wheels: "false"
packages:
- matrix:
arch: aarch64
cuda: "12.2"
use_cuda_wheels: "true"
packages:
# some components (like nvidia-cublas-cu12 and nvidia-cuda-nvcc-cu12) didn't have
# aarch64 wheels until CTK 12.3, so allow a slightly looser bound here
- cuda-toolkit>=12.2,<12.4
- matrix:
cuda: "12.2"
use_cuda_wheels: "true"
packages:
- cuda-toolkit==12.2.*
- matrix:
cuda: "12.5"
use_cuda_wheels: "true"
packages:
- cuda-toolkit==12.5.*
- matrix:
cuda: "12.8"
use_cuda_wheels: "true"
packages:
- cuda-toolkit==12.8.*
- matrix:
cuda: "12.9"
use_cuda_wheels: "true"
packages:
- cuda-toolkit==12.9.*
- matrix:
cuda: "13.0"
use_cuda_wheels: "true"
packages:
- cuda-toolkit==13.0.*
- matrix:
cuda: "13.1"
use_cuda_wheels: "true"
packages:
- cuda-toolkit==13.1.*
cuda:
common:
- output_types: [conda]
Expand Down
Loading