Skip to content

Improved CAGRA build parameter heuristics#1448

Merged
rapids-bot[bot] merged 20 commits intorapidsai:mainfrom
achirkin:fea-cagra-hnsw-heuristics
Nov 3, 2025
Merged

Improved CAGRA build parameter heuristics#1448
rapids-bot[bot] merged 20 commits intorapidsai:mainfrom
achirkin:fea-cagra-hnsw-heuristics

Conversation

@achirkin
Copy link
Copy Markdown
Contributor

@achirkin achirkin commented Oct 22, 2025

Changes to the build parameter heuristics:

  • Move the code from HNSW namespace to CAGRA namespace to avoid depending on HNSW target
  • Add one more variant of the heuristics: allow generating smaller graph to better match the performance of the HNSW-generated graph
  • Implement automatic switch between NN-Descent and IVF-PQ as the graph-build algorithms depending on the dataset size: NN-Descent tends to perform better on smaller-scale datasets

PR also include C and java bindings.
Resolves #1265

@achirkin achirkin self-assigned this Oct 22, 2025
@achirkin achirkin requested a review from a team as a code owner October 22, 2025 14:01
@achirkin achirkin added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Oct 22, 2025
@achirkin achirkin moved this from Todo to In Progress in Unstructured Data Processing Oct 22, 2025
Comment thread c/include/cuvs/neighbors/cagra.h Outdated
@achirkin achirkin requested a review from a team as a code owner October 23, 2025 14:20
@achirkin achirkin force-pushed the fea-cagra-hnsw-heuristics branch from ef02185 to 582db6f Compare October 23, 2025 14:24
@rapidsai rapidsai deleted a comment from copy-pr-bot Bot Oct 23, 2025
@achirkin achirkin requested a review from a team as a code owner October 23, 2025 15:19
Copy link
Copy Markdown
Member

@KyleFromNVIDIA KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved trivial CMake changes

Comment thread cpp/src/neighbors/cagra.cpp Outdated
cuvs::distance::DistanceType metric)
{
cagra::index_params params;
params.graph_degree = 2 + M * 2 / 3;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hard am variant sets graph_degree = 2 * M, it is surprising to see that soft variant can lead to similar search performance with graph_degree < M. The benchmarks for 768 and 1536 dimension looked good. Was it also tested for smaller dimensional datasets?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've tested it on DEEP-100M and glove datasets. The hard-M variant actually shows much higher recall and lower throughput for the same search 'ef' parameter (the QPS-recall curve is close to HNSW, but all points on it are 'shifted' towards higher recall and lower throughput).

Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for the PR, it looks good to me!

Comment thread c/src/neighbors/cagra.cpp Outdated
Copy link
Copy Markdown
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to my eyes. Minor nit regarding javadoc for the new function.

Comment thread java/cuvs-java/src/main/java/com/nvidia/cuvs/spi/CuVSProvider.java
Comment thread java/cuvs-java/src/main/java/com/nvidia/cuvs/CagraIndexParams.java
Comment thread java/cuvs-java/src/main/java22/com/nvidia/cuvs/spi/JDKProvider.java Outdated
Comment thread c/include/cuvs/neighbors/cagra.h Outdated
@achirkin achirkin force-pushed the fea-cagra-hnsw-heuristics branch from e601477 to eeb41ac Compare November 3, 2025 10:56
@achirkin achirkin force-pushed the fea-cagra-hnsw-heuristics branch from eeb41ac to 6904106 Compare November 3, 2025 10:56
@achirkin
Copy link
Copy Markdown
Contributor Author

achirkin commented Nov 3, 2025

/merge

@rapids-bot rapids-bot Bot merged commit d8fdd7d into rapidsai:main Nov 3, 2025
162 of 164 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Unstructured Data Processing Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Implement cuvs::neighbors::hnsw::to_cagra_param function in cuvs/java

8 participants