Skip to content

UMAP fix int32 overflow causing illegal mem access#7587

Merged
rapids-bot[bot] merged 5 commits intorapidsai:mainfrom
aamijar:umap-fix-int32-illegal-mem-access
Dec 11, 2025
Merged

UMAP fix int32 overflow causing illegal mem access#7587
rapids-bot[bot] merged 5 commits intorapidsai:mainfrom
aamijar:umap-fix-int32-illegal-mem-access

Conversation

@aamijar
Copy link
Copy Markdown
Member

@aamijar aamijar commented Dec 10, 2025

Resolves #7517

The optimize_batch_kernel and optimize_batch_kernel_reg use the following condition to check for out of bounds

while (row < nnz)

Inside the loop row += skip_size is used for iteration. However, when row is close to INT32_MAX it can cause an overflow, which leads to the value of row wrapping around to become a negative number. The loop will then be run again since row < nnz still, which will lead to a cuda illegal memory access since row is negative.

To solve this we can check for the overflow case and break from the while loop
if (row > nnz - skip_size) break;

Update: We can use size_t for row and skip_size instead.

See #7517 for a concrete reproducer.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Dec 10, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@aamijar aamijar added bug Something isn't working non-breaking Non-breaking change labels Dec 10, 2025
@aamijar aamijar changed the title UMAP fix int32 illegal mem access UMAP fix int32 overflow causing illegal mem access Dec 10, 2025
@aamijar aamijar marked this pull request as ready for review December 10, 2025 04:29
@aamijar aamijar requested a review from a team as a code owner December 10, 2025 04:29
@aamijar aamijar requested review from csadorf and tarang-jain and removed request for tarang-jain December 10, 2025 04:29
@jinsolp
Copy link
Copy Markdown
Contributor

jinsolp commented Dec 10, 2025

can we instead use size_t as the type for row and skip_size (rather than using nnz_t which can in int or size_t)?
Or change our nnz_t dispatch algorithm?

@jinsolp jinsolp self-requested a review December 10, 2025 20:36
@aamijar
Copy link
Copy Markdown
Member Author

aamijar commented Dec 10, 2025

can we instead use size_t as the type for row and skip_size (rather than using nnz_t which can in int or size_t)? Or change our nnz_t dispatch algorithm?

Yes, I've updated it to use size_t 👍

Comment thread cpp/src/umap/simpl_set_embed/optimize_batch_kernel.cuh Outdated
Copy link
Copy Markdown
Contributor

@jinsolp jinsolp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@aamijar
Copy link
Copy Markdown
Member Author

aamijar commented Dec 11, 2025

/merge

@rapids-bot rapids-bot Bot merged commit bd047bb into rapidsai:main Dec 11, 2025
104 checks passed
mani-builds pushed a commit to mani-builds/cuml that referenced this pull request Jan 11, 2026
Resolves rapidsai#7517

The `optimize_batch_kernel` and `optimize_batch_kernel_reg` use the following condition to check for out of bounds

`while (row < nnz)`

Inside the loop `row += skip_size` is used for iteration. However, when row is close to `INT32_MAX` it can cause an overflow, which leads to the value of row wrapping around to become a negative number. The loop will then be run again since row < nnz still, which will lead to a cuda illegal memory access since row is negative.

To solve this we can check for the overflow case and break from the while loop
`if (row > nnz - skip_size) break;`

Update: We can use `size_t` for row and skip_size instead.

See rapidsai#7517 for a concrete reproducer.

Authors:
  - Anupam (https://github.com/aamijar)

Approvers:
  - Jinsol Park (https://github.com/jinsolp)
  - Victor Lafargue (https://github.com/viclafargue)

URL: rapidsai#7587
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

algo: umap bug Something isn't working CUDA/C++ non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UMAP cuda_illegal_memory_access when nnz close to INT32_MAX

4 participants