Skip to content

ScaNN: Overlapped gather for AVQ#1286

Merged
rapids-bot[bot] merged 2 commits intorapidsai:branch-25.10from
rmaschal:avq-optimizations
Sep 10, 2025
Merged

ScaNN: Overlapped gather for AVQ#1286
rapids-bot[bot] merged 2 commits intorapidsai:branch-25.10from
rmaschal:avq-optimizations

Conversation

@rmaschal
Copy link
Copy Markdown
Contributor

Adds a class cluster_loader for AVQ that enables overlapping the gather operation and HtoD copy with GPU computation.

There are two scenarios:

  1. dataset on device: This is identical to the previous code, using raft::matrix::gather to perform the gather on device.
  2. dataset on host: cluster_loader allocates to pinned buffers in host for fast (and possibly async) copies of cluster vectors DtoH. The actual gather operation is performed on cpu, into the pinned buffer. Copies can be overlapped with GPU work (namely AVQ update of the previous cluster) if scheduled on a separate stream.

@rmaschal rmaschal requested a review from a team as a code owner August 26, 2025 18:27
@rmaschal rmaschal changed the title Overlapped gather for AVQ ScaNN: Overlapped gather for AVQ Aug 28, 2025
@rmaschal rmaschal force-pushed the avq-optimizations branch 3 times, most recently from 6d3aa3c to e223eb7 Compare September 4, 2025 17:15
@rmaschal rmaschal added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Sep 4, 2025
@rmaschal rmaschal self-assigned this Sep 4, 2025
Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rmaschal for the PR, looks good overall, I just have one suggestion for a cosmetic change.

In general, had to think about why do we need a specific version of gather here. Although we have h2d gather in RAFT (admittedly not properly exposed), here an explicit prefetch method enables the user to schedule additional GPU operations in parallel. That looks good to me.

Comment thread cpp/src/neighbors/scann/detail/scann_avq.cuh Outdated
Comment thread cpp/src/neighbors/scann/detail/scann_avq.cuh Outdated
Comment thread cpp/src/neighbors/scann/detail/scann_avq.cuh Outdated
Comment thread cpp/src/neighbors/scann/detail/scann_build.cuh
Comment thread cpp/src/neighbors/scann/detail/scann_avq.cuh
Comment thread cpp/src/neighbors/scann/detail/scann_avq.cuh
Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, LGTM!

@bkarsin
Copy link
Copy Markdown
Contributor

bkarsin commented Sep 9, 2025

PR looks good to me, thanks for the clarification on my questions.

@bkarsin bkarsin closed this Sep 9, 2025
@bkarsin bkarsin reopened this Sep 9, 2025
Copy link
Copy Markdown
Contributor

@bkarsin bkarsin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good to me, thanks (and very sorry for the mistaken close/open)

@tfeher
Copy link
Copy Markdown
Contributor

tfeher commented Sep 10, 2025

/merge

@rapids-bot rapids-bot Bot merged commit 03d62f6 into rapidsai:branch-25.10 Sep 10, 2025
323 of 327 checks passed
@stic
Copy link
Copy Markdown
Contributor

stic commented Sep 11, 2025

Hi @bkarsin,
The assumption is that there is one cluster or clusters will be of same / similar size? Asking as if I've read it correctly max for buffers will be per biggest cluster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants