Forward-merge release/26.02 into main by csadorf · Pull Request #7720 · rapidsai/cuml

csadorf · 2026-01-26T19:08:20Z

No description provided.

Closes rapidsai#7176 This PR fixes outliers when `random_state` is given. # High-level explanation ### Why we had issues in the previous implementation All threads **read the embedding value** of and then write the gradient update out to a **separate buffer**. This ensure determinism because each thread will be computing the gradient on the same value across different runs (value of the embedding in that epoch) instead of nondeterministic values (say, if another thread writes its update into the embedding then we can't be sure whether this thread will read that updated value or the value before the update) a pseudocode looks like this ``` # Existing implementation given random_state for epoch in epochs: # === start kernel launch nnz threads grad = compute_grad(embedding[i], embedding[j]) atomicAdd(out_buff[i], grad) atomicAdd(out_buff[j], grad) # === end kernel embedding += out_buff out_buff = 0 ``` Although this ensures deterministic behavior, it results in outliers because a gradient should be accumulatively computed. i.e. an update to i-th vector in the embedding should be taken into consideration to compute the gradient for the i-th vector in another thread. This already achieved when we don't require determinism: by writing back to the embedding directly so that there are more chances of computing the gradient on an updated value. ``` # Existing implementation when we don't care about determinism for epoch in epochs: # === start kernel launch nnz threads grad = compute_grad(embedding[i], embedding[j]) atomicAdd(embedding[i], grad) atomicAdd(embedding[j], grad) # === end kernel ``` ### Fixes in this PR To keep it deterministic but allow threads to read a somewhat updated value, this PR splits a single epoch into more fine-grained chunks. ``` for epoch in epochs: for chunk in n_chunks: # === start kernel launch nnz threads grad = compute_grad(embedding[i], embedding[j]) atomicAdd(out_buff[i], grad) atomicAdd(out_buff[j], grad) # === end kernel embedding += out_buff out_buff = 0 ``` now after the kernel returns for a chunk, the next chunk of threads start off with an embedding that includes the updates from the previous chunk of threads. It is easy if we think of larger `n_chunks` meaning more serial behavior, and therefore approximating the desired sequential implementation. To be more efficient I added a bitwise-flag to efficiently apply sparse updates per chunk. # Benchmarks ncomp=2 (as of commit rapidsai@1606616) Green slots indicate the cases where we don't see outliers (i.e. with large n_chunks) ### Amazon food data (5M x 384) <img width="814" height="194" alt="Screenshot 2025-12-15 at 4 59 21 PM" src="https://github.com/user-attachments/assets/d6934e84-4085-47d6-9b10-da2882098d4a" /> ### Amazon Sports data (13M x 384) <img width="815" height="323" alt="Screenshot 2025-12-15 at 5 00 08 PM" src="https://github.com/user-attachments/assets/98b53584-7375-4d42-9c48-ef4337e6ab13" /> ### Appliances (1.8M x 384) and Beauty (640K x 384) These didn't have outliers in the first place <img width="816" height="307" alt="Screenshot 2025-12-15 at 5 01 27 PM" src="https://github.com/user-attachments/assets/40632d9f-7b68-4a72-8a9c-c0ab11eda358" /> # Chosen heuristics and Performance Implications Increasing `n_chunks` doesn't increase the optimize runtime (this is due to sparse updates). Thus, have conservatively chosen `num_chunks = raft::ceildiv(nnz, static_cast<nnz_t>(100000))` based on looking at when the results start to be free from outliers. Our original implementation with random_state (numbers in red in the table above) **takes up about 0.2% of the end-to-end** runtime. Thus, having a 2x slowdown in the optimize step doesn't really affect the e2e perf. Authors: - Jinsol Park (https://github.com/jinsolp) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#7597

fix(build): build package on merge to `release/*` branch

…ests and do not fail test runs on integration tests (rapidsai#7715) ## Summary Explicitly install `requests` in BERTopic integration test and add `continue-on-error: true` to wheel integration tests to prevent external dependency failures from blocking nightly CI. ## Motivation Wheel integration tests verify compatibility with external packages (e.g., BERTopic) but should not block CI when those packages have regressions outside our control. **Current failure:** BERTopic test fails due to missing `requests` dependency in `sentence-transformers==5.2.1` (released today, 2026-01-26): ```python ModuleNotFoundError: No module named 'requests' File "sentence_transformers/util/file_io.py", line 7 ``` This is an upstream bug in sentence-transformers, not a cuML issue. See also: huggingface/sentence-transformers#3617 Authors: - Simon Adorf (https://github.com/csadorf) Approvers: - Bradley Dice (https://github.com/bdice) - James Lamb (https://github.com/jameslamb) URL: rapidsai#7715

Closes rapidsai#4249 The [GPU-accelerated version of Delaunay](https://docs.cupy.dev/en/latest/reference/generated/cupyx.scipy.spatial.Delaunay.html) will be release in cuPy soon. Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Simon Adorf (https://github.com/csadorf) URL: rapidsai#7674

csadorf · 2026-01-26T22:18:52Z

/merge nosquash

rapids-bot · 2026-01-26T22:18:57Z

Commit history integrity check failed: not all commits from original PR #7718 appear to be present individually in this PR's history. This usually happens if commits were squashed during the manual resolution process. Please ensure all original commits are preserved individually. You can fix this and try the /merge nosquash command again.

csadorf · 2026-01-26T22:39:03Z

/merge nosquash

…a-cuda Fallback to numba-cuda with no extra CUDA packages if 'cuda_suffixed' isn't true

rapids-bot · 2026-01-26T23:31:51Z

Commit history integrity check failed: not all commits from original PR #7718 appear to be present individually in this PR's history. This usually happens if commits were squashed during the manual resolution process. Please ensure all original commits are preserved individually. You can fix this and try the /merge nosquash command again.

Closes rapidsai#7648 Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#7616

csadorf · 2026-01-27T16:43:58Z

/merge nosquash

jinsolp and others added 2 commits January 26, 2026 17:55

Merge pull request rapidsai#7719 from gforsyth/rapids-branch-main

4ea7e80

fix(build): build package on merge to `release/*` branch

csadorf requested review from a team as code owners January 26, 2026 19:08

csadorf requested review from divyegala, hcho3 and tarang-jain January 26, 2026 19:08

github-actions Bot assigned csadorf Jan 26, 2026

github-actions Bot added Cython / Python Cython or Python issue CUDA/C++ labels Jan 26, 2026

csadorf changed the title ~~Main merge release/26.02~~ Forward-merge release/26.02 into main Jan 26, 2026

csadorf mentioned this pull request Jan 26, 2026

Forward-merge release/26.02 into main #7718

Merged

csadorf added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 26, 2026

csadorf and others added 2 commits January 26, 2026 19:46

dantegd approved these changes Jan 26, 2026

View reviewed changes

csadorf force-pushed the main-merge-release/26.02 branch from 0aa51e0 to df6a931 Compare January 26, 2026 22:21

csadorf requested a review from a team as a code owner January 26, 2026 22:21

csadorf requested a review from msarahan January 26, 2026 22:21

github-actions Bot added the ci label Jan 26, 2026

jameslamb approved these changes Jan 26, 2026

View reviewed changes

Merge pull request rapidsai#7721 from trxcllnt/fix/cuda-suffixed-numb…

0024aea

…a-cuda Fallback to numba-cuda with no extra CUDA packages if 'cuda_suffixed' isn't true

csadorf force-pushed the main-merge-release/26.02 branch from df6a931 to 0aa9676 Compare January 27, 2026 14:52

github-actions Bot removed the ci label Jan 27, 2026

csadorf force-pushed the main-merge-release/26.02 branch from 0aa9676 to c6b7cff Compare January 27, 2026 14:55

csadorf requested a review from a team as a code owner January 27, 2026 14:55

github-actions Bot added the ci label Jan 27, 2026

Add precomputed kernel support in SVM (rapidsai#7616)

a931a3b

Closes rapidsai#7648 Authors: - Victor Lafargue (https://github.com/viclafargue) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#7616

jameslamb approved these changes Jan 27, 2026

View reviewed changes

Merge release/26.02 into main

ee46427

csadorf force-pushed the main-merge-release/26.02 branch from c6b7cff to ee46427 Compare January 27, 2026 15:36

rapids-bot Bot merged commit f9928c4 into rapidsai:main Jan 27, 2026
117 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward-merge release/26.02 into main#7720

Forward-merge release/26.02 into main#7720
rapids-bot[bot] merged 7 commits intorapidsai:mainfrom
csadorf:main-merge-release/26.02

csadorf commented Jan 26, 2026

Uh oh!

csadorf commented Jan 26, 2026

Uh oh!

rapids-bot Bot commented Jan 26, 2026

Uh oh!

csadorf commented Jan 26, 2026

Uh oh!

rapids-bot Bot commented Jan 26, 2026

Uh oh!

csadorf commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

csadorf commented Jan 26, 2026

Uh oh!

csadorf commented Jan 26, 2026

Uh oh!

rapids-bot Bot commented Jan 26, 2026

Uh oh!

csadorf commented Jan 26, 2026

Uh oh!

rapids-bot Bot commented Jan 26, 2026

Uh oh!

csadorf commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants