Skip to content

Improve the performance of CAGRA new vector addition with the default params#569

Merged
rapids-bot[bot] merged 16 commits into
rapidsai:branch-25.02from
enp1s0:update-cagra-graph-extend
Jan 23, 2025
Merged

Improve the performance of CAGRA new vector addition with the default params#569
rapids-bot[bot] merged 16 commits into
rapidsai:branch-25.02from
enp1s0:update-cagra-graph-extend

Conversation

@enp1s0
Copy link
Copy Markdown
Member

@enp1s0 enp1s0 commented Jan 14, 2025

This PR updates the default chunk size of the CAGRA graph extension and also adds a knob to control the batch size of the CAGRA searches run inside for better throughput.

The default chunk size was set to 1 in the current implementation because there is a potential problem with low recall when the chunk size is large, because no edges are made within nodes in the same chunk. However, as I have investigated, the low recall problem rarely occurs with large chunk sizes.

Search performance

The performance was measured after applying a bugfix #565

degree = 32

extend-ir0 9-degree32

(I don't know the reason the performance is unstable in NYTimes.)

degree = 64

extend-ir0 9-degree64

So I increase the default chunk size to the size of the new dataset vectors for better throughput in this PR. I also make public a knob to control the search batch size in the `extend' function to control the balance between throughput and memory consumption.

@enp1s0 enp1s0 requested a review from a team as a code owner January 14, 2025 08:48
@enp1s0 enp1s0 self-assigned this Jan 14, 2025
@github-actions github-actions Bot added the cpp label Jan 14, 2025
@enp1s0 enp1s0 added breaking Introduces a breaking change improvement Improves an existing functionality and removed cpp labels Jan 14, 2025
@github-actions github-actions Bot added the cpp label Jan 16, 2025
@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Jan 16, 2025

/ok to test

Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enp1s0 for the PR! It looks good, but we should discuss whether we need a new parameter, or whether we could utilize the workspace memory resource for controlling the batch size.

The raft resource handle has a workspace memory resource. This is set up as an rmm::mr::limiting_resource_adaptor. The intended usage of the workspace is to query it's size, and adjust our internal batching parameter accordingly.

Here is an example how we can query the available workspace size using get_workspace_free_bytes

Afterwards, we need to use the workspace memory resource to allocate the temporary buffers:

auto mr = raft::resource::get_workspace_resource(handle);
// Maximum number of query vectors to search at the same time.
const auto max_queries = std::min<uint32_t>(std::max<uint32_t>(n_queries, 1), kMaxQueries);
auto max_batch_size = get_max_batch_size(handle, k, n_probes, max_queries, max_samples);
rmm::device_uvector<float> float_queries(max_queries * dim_ext, stream, mr);

Users who need fine control on temporary allocations can set the workspace allocator explicitly:

// raft::resource::set_workspace_to_pool_resource(dev_resources, 2 * 1024 * 1024 * 1024ull);

Note that by default the allocation limit is 1/4th of GPU Total memory, which is potentially much larger then the default for max_working_device_memory_size_in_megabyte introduced` in this PR.

@enp1s0
Copy link
Copy Markdown
Member Author

enp1s0 commented Jan 20, 2025

Thank you @tfeher for the review. I changed the code to use get_workspace_free_bytes and removed max_working_device_memory_size_in_megabyte. The original code has already used raft::resource::get_workspace_resource, so I just changed the max batch size calculation.

Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Hiroyuki for the update! I have one more question below.

Comment thread cpp/src/neighbors/detail/cagra/add_nodes.cuh
Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enp1s0 for the updates, the PR looks good to me!

@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Jan 23, 2025

/merge

@rapids-bot rapids-bot Bot merged commit 80370a1 into rapidsai:branch-25.02 Jan 23, 2025
@enp1s0 enp1s0 deleted the update-cagra-graph-extend branch January 24, 2025 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Introduces a breaking change cpp improvement Improves an existing functionality

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants