build wheels with CUDA 13.0.x, test wheels against mix of CTK versions#2971
build wheels with CUDA 13.0.x, test wheels against mix of CTK versions#2971rapids-bot[bot] merged 19 commits intorapidsai:release/26.04from
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Contributes to rapidsai/build-planning#256 `rapids-generate-pip-constraints` currently special-cases `RAPIDS_DEPENDENCIES="latest"` and skips generating constraints in that case. This will be helpful in rapidsai/build-planning#256, where we want to start constraining `cuda-toolkit` in wheels CI based on the CTK version in the CI image being used. ## Notes for Reviewers ### How I tested this Looked for projects using this ([GitHub search](https://github.com/search?q=org%3Arapidsai+language%3AShell+%22rapids-generate-pip-constraints%22+AND+NOT+is%3Aarchived+&type=code)) and tested in them. It's just a few: * [ ] cudf (rapidsai/cudf#21639) * [ ] cuml (rapidsai/cuml#7853) * [ ] dask-cuda (rapidsai/dask-cuda#1632) * [ ] nvforest (rapidsai/nvforest#62) * [ ] raft (rapidsai/raft#2971) * [ ] rmm (rapidsai/rmm#2270) On all of those, wheels CI jobs worked exactly as expected and without needing any code changes or `dependencies.yaml` updates... so this PR is safe to merge any time. ### Is this safe? It should be (see "How I tested this"). This is only used to add **constraints** (not requirements), so it shouldn't change our ability to catch problems like "forgot to declare a dependency" in CI. It WILL increase the risk of `[test]` extras being underspecified. For example, if `cuml[test]` has `scikit-learn>=1.3` and the constraints have `scikit-learn>=1.5`, we might never end up testing `scikit-learn>=1.3,<1.5` (unless it's explicitly accounted for in a `dependencies: "oldest"` block). The other risk here is that this creates friction because constraints passed to `--constraint` cannot contain extras. So e.g. if you want to depend on `xgboost[dask]`, that cannot be in any of the lists generated by `rapids-generate-pipe-constraints`. I think we can work around that though when we hit those cases. Overall, I think these are acceptable tradeoffs. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #247
|
/ok to test |
conda/recipes/libraft/recipe.yaml
Outdated
| - if: cuda_major == "13" | ||
| then: | ||
| # always want libnvJitLink >= whatever was built against | ||
| # ref: https://docs.nvidia.com/cuda/nvjitlink/index.html#compatibility | ||
| - ${{ pin_compatible("libnvjitlink", lower_bound="x.x.x", upper_bound="x") }} |
There was a problem hiding this comment.
We added nvjitlink to RAFT to work around a bug in CUDA WHEEL packaging. #2948
There is no direct usage of nvjitlink in RAFT.
There is no conda dependency needed here at all, conda-forge packaging is already correct. Everything touching conda should be reverted.
There was a problem hiding this comment.
Ah ok, I misunderstood that.
libnvjitlink is in the libraft host environment
...
│ │ │ libnvjitlink ┆ 13.2.51 ┆ hecca717_0 ┆ conda-forge ┆ 30.22 MiB │
...
(recent conda-cpp-build CUDA 13.1 build link)
But I get that there's notdirect usage here, and it's not showing up in the runtime dependencies.
I'll revert this.
dependencies.yaml
Outdated
| - *cuda_toolkit_any_cu13 | ||
| - &nvjitlink_cu13 nvidia-nvjitlink>=13.1,<14 | ||
| - matrix: | ||
| cuda: "13.*" |
There was a problem hiding this comment.
Why do we need three matrices for 13.0, 13.1, and 13.*?
I think it should be fine to have one 13.* matrix that requires nvidia-nvjitlink>=13.0,<14 if we're building with 13.0.
There was a problem hiding this comment.
These groups makes the coupling between the CTK version we're building against and the nvidia-nvjitlink floor explicit.
- having a catch-all
13.*because we happen to be building against 13.0 for now increases the risk that we'll accidentally build wheels in the future with a too-lownvidia-nvjitlinkpin - not having a
13.1means that if we switch back to building against CTK 13.1, we'll need new PRs to all the repos to change these floors
Although I guess we already end up with a fallback matrix anyway to populate pyproject.toml (for documentation purposes) so we're already in that position of needing to remember to update this.
I'll take this suggestion and make it a single 13.* (and do that for all the other PRs), maybe we can find a better and stricter mix in as a follow-up.
There was a problem hiding this comment.
having a catch-all 13.* because we happen to be building against 13.0 for now increases the risk that we'll accidentally build wheels in the future with a too-low
nvidia-nvjitlinkpin
Hopefully this is not an issue for very long if we can get cuda-toolkit pinnings loosened.
I think this outcome is clearer -- though it is a bit more "hardcoded", we don't know exactly what future we're pointing towards until we know whether cuda-toolkit pinnings will be loosened.
There was a problem hiding this comment.
yep fair enough, I'll apply this in the other PRs in this series, thanks
|
/merge |
048aa19
into
rapidsai:release/26.04
…wheels against mix of CTK versions (#1862) The changes from #1405 introduced linking against nvJitLink. nvJitLink has versioned symbols that are added in each new CTK release, and some of those are exposed in `libcuvs.so`. `libcuvs` wheels are built against the latest CTK supported in RAPIDS (CUDA 13.1.1 as of this writing), so when those wheels are used in environments with older nvJitLink, runtime errors like this can happen: > libcugraph.so: undefined symbol: __nvJitLinkGetErrorLog_13_1, version libnvJitLink.so.13 For more details, see rapidsai/cugraph#5443 This tries to fix that. Contributes to rapidsai/build-planning#257 * builds CUDA 13 wheels with the 13.0 CTK * ensures CUDA 13 wheels ship with a runtime dependency of `nvidia-nvjitlink>={whatever-minor-version-they-were-built-against}` Contributes to rapidsai/build-planning#256 * updates wheel tests to cover a range of CTK versions (we previously, accidentally, were only testing the latest 12.x and 13.x) Other changes * ensures conda packages also take on floors of `libnvjitlink>={whatever-minor-version-they-were-built-against}` ## Notes for Reviewers ### How I tested this This uses wheels from similar PRs from RAPIDS dependencies, at build and test time: * rapidsai/raft#2971 * rapidsai/rmm#2270 * rapidsai/ucxx#604 ### Other Options 1. avoiding those versioned symbols with a build-time shim (#1855 does this, but hasn't been successful yet) 2. statically linking libnvJitLink (hasn't been successful yet) Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Gil Forsyth (https://github.com/gforsyth) URL: #1862
…wheels against mix of CTK versions (#5457) Fixes #5443 Contributes to rapidsai/build-planning#257 * builds CUDA 13 wheels with the 13.0 CTK * ensures wheels ship with a runtime dependency of `nvidia-nvjitlink>={whatever-minor-version-they-were-built-against}` Contributes to rapidsai/build-planning#256 * updates wheel tests to cover a range of CTK versions (we previously, accidentally, were only testing the latest 12.x and 13.x) Other changes * ensures conda packages also take on floors of `libnvjitlink>={whatever-version-they-were-built-against}` ## Notes for Reviewers ### How I tested this This uses wheels from similar PRs from RAPIDS dependencies, at build and test time: * rapidsai/cudf#21671 * rapidsai/kvikio#942 * rapidsai/raft#2971 * rapidsai/rmm#2270 * rapidsai/ucxx#604 Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Gil Forsyth (https://github.com/gforsyth) URL: #5457
Contributes to rapidsai/build-planning#257
nvidia-nvjitlink>={whatever-minor-version-they-were-built-against}Contributes to rapidsai/build-planning#256
Other changes
libnvjitlink>={whatever-version-they-were-built-against}Notes for Reviewers
How I tested this
This uses wheels from similar PRs from RAPIDS dependencies, at build and test time: