Skip to content

Dropping bits in the old CI & Propagating recent changes from cuda-python#683

Merged
leofang merged 10 commits intoNVIDIA:mainfrom
leofang:move_to_new_ci
Jan 7, 2026
Merged

Dropping bits in the old CI & Propagating recent changes from cuda-python#683
leofang merged 10 commits intoNVIDIA:mainfrom
leofang:move_to_new_ci

Conversation

@leofang
Copy link
Member

@leofang leofang commented Dec 19, 2025

Part of #671.

This PR removes the old wheel build/test workflows in favor of the new ones. In particular, this allows us to drop the GH-hosted Windows GPU runners (cuda-python-windows-gpu-github) entirely and to also reduce the amount of CI resource that we use in this repo (we currently use more than we should, because the old/new CIs are running in parallel).

Then, all other test jobs, such as conda & 3rd libs, in the old CI (ci.yaml runs on main, while pr.yaml runs per PR, but the content of both are almost identical) are hooked to the new CI as-is (ci-new.yaml runs on both main and per-PR). Further consolidation would require some nontrivial changes (ex: commit 0d5e440 proved that we still need to use RAPIDS containers for these jobs, so I reverted it).

Some recent cuda-python CI changes are propagated to here, including transition from JSON to YAML files. However, the two YAML files test-matrix.yaml and matrix.yaml still cannot be merged yet, due to some workflows still using the latter (ex: #680).

Below is an LLM-generated summary:

➡️ CLICK ME ⬅️

This pull request significantly refactors and consolidates the project's CI/CD workflows. The main focus is on migrating jobs from legacy workflow files (ci.yaml, pr.yaml) into the unified ci-new.yaml workflow, modernizing configuration files, and improving maintainability by removing redundancy and unused scripts.

CI/CD Workflow Consolidation and Refactoring:

  • Migrated all major jobs (e.g., pre-commit, matrix computation, conda/wheel builds and tests, third-party integration tests, documentation, and coverage reporting) from the old .github/workflows/ci.yaml and .github/workflows/pr.yaml files into .github/workflows/ci-new.yaml, and then deleted the legacy workflow files. This centralizes and simplifies CI management. [1] [2] [3] [4] [5]

  • Updated the test matrix logic in .github/workflows/test-wheel-linux.yml to use YAML files and yq for matrix generation instead of JSON and jq, aligning with the new workflow structure.

Configuration and Quality-of-Life Improvements:

  • Enhanced .github/dependabot.yml with improved copyright, scheduling, and update grouping, and expanded update types to include major version bumps.

  • Improved CUDA version extraction in .github/workflows/ci-new.yaml to use yq with YAML files instead of jq with JSON, reflecting the project's move to YAML-based configuration.

Cleanup and Maintenance:

  • Removed the unused Windows GPU driver installation script, as it is no longer needed in the new workflow structure.

Bug Fixes and Minor Adjustments:

  • Fixed a bug in the fetch_ctk action to only move the lib directory if it exists, preventing errors on certain platforms.

@copy-pr-bot

This comment was marked as outdated.

@leofang

This comment was marked as outdated.

@leofang leofang self-assigned this Dec 19, 2025
@leofang leofang linked an issue Dec 19, 2025 that may be closed by this pull request
@leofang leofang added the 2 - In Progress Currently a work in progress label Dec 19, 2025
@leofang

This comment was marked as outdated.

@leofang

This comment was marked as outdated.

@leofang

This comment was marked as outdated.

@leofang

This comment was marked as outdated.

@leofang

This comment was marked as outdated.

@leofang

This comment was marked as outdated.

@leofang

This comment was marked as outdated.

@leofang
Copy link
Member Author

leofang commented Dec 20, 2025

/ok to test 78aad77

@leofang

This comment was marked as outdated.

@leofang leofang removed the 2 - In Progress Currently a work in progress label Dec 20, 2025
@leofang leofang added the 3 - Ready for Review Ready for review by team label Dec 20, 2025
@leofang leofang marked this pull request as ready for review December 20, 2025 02:48
@leofang
Copy link
Member Author

leofang commented Dec 20, 2025

This is ready for review, but it cannot be merged without changing the ruleset to use the new Check job status first (I am not an admin of this repo, so I cannot change it):

截圖 2025-12-19 下午9 49 30

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 20, 2025

Greptile Summary

This PR successfully consolidates the CI infrastructure by migrating all jobs from legacy workflows (ci.yaml, pr.yaml) into the unified ci-new.yaml workflow, enabling the retirement of old Windows GPU runners and reducing overall CI resource usage.

Key changes:

  • Migrated all conda/wheel build jobs, third-party integration tests, documentation, and coverage reporting from old CI to new unified workflow
  • Transitioned configuration files from JSON to YAML format (test-matrix.jsontest-matrix.yml, versions.jsonversions.yml)
  • Updated matrix generation to use yq instead of jq throughout the workflow files
  • Enhanced Dependabot configuration with schedule timing and major version updates
  • Fixed bug in fetch_ctk action: added directory existence check before moving lib to lib64
  • Improved test script with system info output and CUDA-specific test group installation
  • Cleaned up deprecated files: old workflow files, unused Windows GPU driver scripts, and obsolete test scripts

The consolidation maintains all existing functionality while simplifying CI management and reducing redundancy between parallel CI systems.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • Score reflects thorough CI infrastructure consolidation with clean migration of all jobs from old workflows to new unified workflow, proper JSON-to-YAML configuration migration, necessary cleanup of deprecated files, and includes a bug fix for the CTK fetcher
  • No files require special attention

Important Files Changed

Filename Overview
.github/workflows/ci-new.yaml Consolidated CI workflow now includes all jobs from old ci.yaml and pr.yaml, successfully unifying build/test jobs from new CI with conda/third-party tests from old CI
.github/workflows/ci.yaml Deleted - jobs migrated to ci-new.yaml
.github/workflows/pr.yaml Deleted - jobs migrated to ci-new.yaml
.github/actions/fetch_ctk/action.yml Fixed bug: added check for lib directory existence before moving to lib64, preventing errors on platforms without the directory
.github/workflows/test-wheel-linux.yml Migrated from JSON to YAML for test matrix using yq instead of jq, aligning with new configuration approach
.github/workflows/test-wheel-windows.yml Migrated from JSON to YAML for test matrix using yq instead of jq, aligning with new configuration approach
ci/test-matrix.yml Converted from JSON to YAML format, maintaining same test matrix structure and configurations
ci/versions.yml Converted from JSON to YAML format for CUDA version configuration
.github/dependabot.yml Enhanced with schedule timing, timezone configuration, and expanded update types to include major version bumps
ci/tools/run-tests Improved test script: added system info output, fixed CUDA-specific test group installation, and removed -rxXs pytest flags

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. .github/workflows/ci-new.yaml, line 346 (link)

    style: leftover commented code from debugging or development

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

23 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@leofang leofang linked an issue Dec 24, 2025 that may be closed by this pull request
@leofang leofang requested review from cpcloud and gmarkall January 5, 2026 13:53
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 6, 2026

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@leofang leofang requested a review from kkraus14 January 6, 2026 14:36
@leofang
Copy link
Member Author

leofang commented Jan 7, 2026

/ok to test d5b8b63

@leofang
Copy link
Member Author

leofang commented Jan 7, 2026

@kkraus14 @rparolin @gmarkall this needs to be done before we can merge this PR: #683 (comment). I don't have admin access to this repo, so I can't do it myself 😛

@leofang leofang merged commit 961b22a into NVIDIA:main Jan 7, 2026
100 checks passed
@leofang leofang deleted the move_to_new_ci branch January 7, 2026 20:04
@leofang
Copy link
Member Author

leofang commented Jan 7, 2026

this needs to be done before we can merge this PR: #683 (comment).

Keith granted me access to the devop group, and I was able to adjust the ruleset to merge this PR.

@leofang leofang added this to the v0.24.0 milestone Jan 7, 2026
gmarkall added a commit to gmarkall/numba-cuda that referenced this pull request Jan 12, 2026
- Add arch specific target support (NVIDIA#549)
- chore: disable `locked` flag to bypass prefix-dev/pixi#5256 (NVIDIA#714)
- ci: relock pixi (NVIDIA#712)
- ci: remove redundant conda build in ci (NVIDIA#711)
- chore(deps): bump numba-cuda version and relock pixi (NVIDIA#707)
- Dropping bits in the old CI & Propagating recent changes from cuda-python (NVIDIA#683)
- Fix `test_wheel_deps_wheels.sh` to actually uninstall `nvvm` and `nvrtc` packages for CUDA 13 (NVIDIA#701)
- perf: remove some exception control flow and buffer-exception penalization for arrays (NVIDIA#700)
- perf: let CAI fall through instead of calling from_cuda_array_interface (NVIDIA#694)
- chore: perf lint (NVIDIA#697)
- chore(deps): bump deps in pixi lockfile (NVIDIA#693)
- fix: use freethreading-supported `_PySet_NextItemRef` where possible (NVIDIA#682)
- Support python `3.14` (NVIDIA#599)
- Remove customized address space tracking and address class emission in debug info (NVIDIA#669)
- Drop `experimental` from cuda.core namespace imports (NVIDIA#676)
- Remove dangling references to NUMBA_CUDA_ENABLE_MINOR_VERSION_COMPATIBILITY (NVIDIA#675)
- Use `rapidsai/sccache` in CI (NVIDIA#674)
- chore(dev-deps): remove ipython and pyinstrument (NVIDIA#670)
- Set up a new VM-based CI infrastructure  (NVIDIA#604)
@gmarkall gmarkall mentioned this pull request Jan 12, 2026
gmarkall added a commit that referenced this pull request Jan 12, 2026
- Add arch specific target support (#549)
- chore: disable `locked` flag to bypass
prefix-dev/pixi#5256 (#714)
- ci: relock pixi (#712)
- ci: remove redundant conda build in ci (#711)
- chore(deps): bump numba-cuda version and relock pixi (#707)
- Dropping bits in the old CI & Propagating recent changes from
cuda-python (#683)
- Fix `test_wheel_deps_wheels.sh` to actually uninstall `nvvm` and
`nvrtc` packages for CUDA 13 (#701)
- perf: remove some exception control flow and buffer-exception
penalization for arrays (#700)
- perf: let CAI fall through instead of calling
from_cuda_array_interface (#694)
- chore: perf lint (#697)
- chore(deps): bump deps in pixi lockfile (#693)
- fix: use freethreading-supported `_PySet_NextItemRef` where possible
(#682)
- Support python `3.14` (#599)
- Remove customized address space tracking and address class emission in
debug info (#669)
- Drop `experimental` from cuda.core namespace imports (#676)
- Remove dangling references to
NUMBA_CUDA_ENABLE_MINOR_VERSION_COMPATIBILITY (#675)
- Use `rapidsai/sccache` in CI (#674)
- chore(dev-deps): remove ipython and pyinstrument (#670)
- Set up a new VM-based CI infrastructure  (#604)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: Use self-hosted Windows GPU runners CI: Avoid using any container in the build-wheel workflow

4 participants