Skip to content

IVF-PQ coarse search: fix integer overflow and avoid excessive batch sizes#999

Merged
rapids-bot[bot] merged 12 commits intorapidsai:branch-25.08from
achirkin:fix-coarse-search-size-overflow
Jul 16, 2025
Merged

IVF-PQ coarse search: fix integer overflow and avoid excessive batch sizes#999
rapids-bot[bot] merged 12 commits intorapidsai:branch-25.08from
achirkin:fix-coarse-search-size-overflow

Conversation

@achirkin
Copy link
Copy Markdown
Contributor

@achirkin achirkin commented Jun 10, 2025

When the IVF-PQ batch size and the number of cluster are big enough, the GEMM output matrix size may exceed the uint32_t limits. This manifests as an illegal memory access popping up in further stages of IVF-PQ (due to everything being as async as possible).

This PR fixes the bug by converting the input arguments to 64-bits before multiplying them.

On top of the overflow fix, this PR also limits the batch size when it may hurt performance: (1) within IVF-PQ to avoid out-of-memory error (2) in CAGRA graph build to avoid switching to the large memory resource (which is typically backed by managed memory).

@achirkin achirkin self-assigned this Jun 10, 2025
@achirkin achirkin requested a review from a team as a code owner June 10, 2025 18:21
@achirkin achirkin added the bug Something isn't working label Jun 10, 2025
@achirkin achirkin added the non-breaking Introduces a non-breaking change label Jun 10, 2025
@achirkin achirkin moved this to In Progress in Unstructured Data Processing Jun 10, 2025
@github-actions github-actions Bot added the cpp label Jun 10, 2025
@achirkin achirkin changed the title IVF-PQ coarse search: fix integer overflow IVF-PQ coarse search: fix integer overflow and avoid excessive batch sizes Jun 16, 2025
Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for the fix, I have left a few suggestions below.

Comment thread cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated
Comment thread cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated
Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_search.cuh Outdated
Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_search.cuh Outdated
@achirkin achirkin requested a review from tfeher July 16, 2025 12:44
Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for the update LGTM!

@tfeher
Copy link
Copy Markdown
Contributor

tfeher commented Jul 16, 2025

/merge

@rapids-bot rapids-bot Bot merged commit c04ad9b into rapidsai:branch-25.08 Jul 16, 2025
53 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Unstructured Data Processing Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cpp non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants