Support CUDA 12.2#672
Support CUDA 12.2#672rapids-bot[bot] merged 23 commits intorapidsai:branch-24.04from jameslamb:test-cuda-12.2
Conversation
|
The error seen here doesn't appear to be CUDA 12.2 specific Reproduced here: #675 (comment) Discussing offline on how to resolve |
|
Updating branch to pull in recent CI fixes ( #680 ) Maybe that helps clear things up |
|
The good news is CUDA 12.2 passes! 🎉 The bad news is it looks like the CUDA 11.8 Conda test is running into a bunch of test failures. Unfortunately the job dies around 12% of the way through the test suite. So we don't learn any more about what happened Noticing that there are some CUDA 12 packages getting installed in the CUDA 11.8 build on CI. Looking at the PR, notice we are making some changes to the CUDA 11.8 environment. Maybe this is related? Edit: Adding snippet of CTK packages below Details |
| rapids-mamba-retry install \ | ||
| --channel "${CPP_CHANNEL}" \ | ||
| --channel "${PYTHON_CHANNEL}" \ | ||
| "cuda-version=${RAPIDS_CUDA_VERSION%.*}" \ | ||
| "libcucim=${RAPIDS_VERSION_NUMBER}" \ | ||
| "cucim=${RAPIDS_VERSION_NUMBER}" |
There was a problem hiding this comment.
After discussion offline, we determine the CUDA 11.8 build was failing as the packages were being upgraded in this step to CUDA 12.3, which was unexpected
To try and fix this, have pinned cuda-version while installing libcucim & cucim. It appears that resolves the upgrade issue and allows the tests to pass
That said, we didn't expect to need a cuda-version pinning here. That may deserve some additional investigation on its own (with possible follow up here and in other RAPIDS projects)
There was a problem hiding this comment.
Can this be the root cause of what we see here? conda-forge/cupy-feedstock#247 (comment)
There was a problem hiding this comment.
With cuda-version added to cupy in PR ( conda-forge/cupy-feedstock#249 ), think we can now try dropping cuda-version
| rapids-mamba-retry install \ | |
| --channel "${CPP_CHANNEL}" \ | |
| --channel "${PYTHON_CHANNEL}" \ | |
| "cuda-version=${RAPIDS_CUDA_VERSION%.*}" \ | |
| "libcucim=${RAPIDS_VERSION_NUMBER}" \ | |
| "cucim=${RAPIDS_VERSION_NUMBER}" | |
| rapids-mamba-retry install \ | |
| --channel "${CPP_CHANNEL}" \ | |
| --channel "${PYTHON_CHANNEL}" \ | |
| "libcucim=${RAPIDS_VERSION_NUMBER}" \ | |
| "cucim=${RAPIDS_VERSION_NUMBER}" |
There was a problem hiding this comment.
For posterity, would note that when we saw the issue previously (before adding the cuda-version workaround above), we do see cuda-version=11.8 in the specs from the environment update on CI
Transaction
Prefix: /opt/conda/envs/test
Updating specs:
- gputil[version='>=1.4.0']
- cuda-version=11.8
- imagecodecs[version='>=2021.6.8']
- matplotlib-base
- openslide-python[version='>=1.3.0']
- pip
- pooch[version='>=1.6.0']
- psutil[version='>=5.8.0']
- pytest-cov[version='>=2.12.1']
- pytest-lazy-fixture[version='>=0.6.3']
- pytest-xdist
- pytest[version='>=6.2.4']
- python=3.10
- tifffile[version='>=2022.7.28']
IOW the solver recognizes we've explicitly requested cuda-version with a specific version constraint
Despite this the solver later ignores this constraint and updates cuda-version anyways later in the same CI log:
- cuda-version 11.8 h70ddcb2_2 conda-forge Cached
+ cuda-version 12.3 h32bc705_2 conda-forge 21kB
There was a problem hiding this comment.
It looks like we still have this issue. However it is now with CUDA 12.0. Here is a relevant snippet below (also when cupy is installed with the PR build of cucim) taken from CI:
- cuda-version 12.0 hffde075_2 conda-forge Cached
+ cuda-version 12.3 h32bc705_2 conda-forge 21kB
There was a problem hiding this comment.
The CUDA 12 problems should be resolved by the fixes discussed here: rapidsai/build-planning#8 (comment)
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
bdice
left a comment
There was a problem hiding this comment.
CI logs look fine. I will file a follow-up PR to make libcufile dependencies included on only x86_64 (this was a pre-existing problem so I don't want to put it in-scope for this PR).
|
/merge |
|
/merge |
|
The promised follow-up PR is here: #699 |
Follow-up from #672. This fixes an issue where libcufile-dev could be included in aarch64 environments (this path was never called in CI so it wasn't a huge problem). I also fixed some duplication in dependencies.yaml. The CUDA compilers (for 11 and 12) are now included in the `build` dependency list, and all CUDA libraries are included in the `cuda` dependency list. As before, the CUDA version is constrained by the `cuda_version` dependency list. This is more aligned with how cudf's dependency list is structured. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Jake Awe (https://github.com/AyodeAwe) - https://github.com/jakirkham URL: #699
Follow-up to #672 For all GitHub Actions configs, replaces uses of the `test-cuda-12.2` branch on `shared-workflows` with `branch-24.04`, now that rapidsai/shared-workflows#166 has been merged. ### Notes for Reviewers This is part of ongoing work to build and test packages against CUDA 12.2 across all of RAPIDS. For more details see: * rapidsai/build-planning#7 *(created with `rapids-reviser`)* Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: #702
Description
cuda-version={major}.{minor}stuff independencies.yamlthat was missed in refactor CUDA versions in dependencies.yaml #671Notes for Reviewers
This is part of ongoing work to build and test packages against CUDA 12.2.2 across all of RAPIDS.
For more details see:
Planning a second round of PRs to revert these references back to a proper
branch-24.{nn}release branch ofshared-workflowsonce rapidsai/shared-workflows#166 is merged.(created with
rapids-reviser)