CosineExpanded Distance Metric for CAGRA#197
CosineExpanded Distance Metric for CAGRA#197rapids-bot[bot] merged 113 commits intorapidsai:branch-25.12from
Conversation
There was a problem hiding this comment.
Hi @tarang-jain, a heads up here: #296 does a major refactoring of related code; let's have a look together how we can proceed with this PR once you're back to it, ok?
I have similar performance concerns as the ones we discussed on IVF-PQ; maybe it makes sense to keep the dataset normalized for cosine distance (and reuse the inner-product code path)?
Then we can either normalize the query at the time we copy it to the shared memory (pre-processing) or divide by the query norm at the post-processing/filtering step at the end of the kernel.
There was a problem hiding this comment.
Refactored this PR to do the divide by query norm at the very end (postprocessing stage)-
|
Has this work been abandoned? |
|
@pmiloslavsky this PR is currently being reworked to do the computation in place, instead of normalizing the vectors front. The normalization trick works well when users have control over the input vectors, but in the case of CAGRA, the dataset needs to be stored with the graph, and we try not to alter the user’s input on their behalf when we can’t give it right back to them. |
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
… into cagra-dist-metric
tfeher
left a comment
There was a problem hiding this comment.
Hi Tarang, thank you for the updates! As discussed offline, the test size increase is still substantial. I have a few suggestions how to cut that down.
|
/ok to test a6c6592 |
tfeher
left a comment
There was a problem hiding this comment.
Thanks @tarang-jain for improving the test time. LGTM.
|
/rerun tests |
|
/rerun failed tests |
|
/merge |
88f7f23
into
rapidsai:branch-25.12
Currently only IVF-PQ can be used as the graph building algorithm (NN Descent does not support Cosine). As a result, we are limited by IVF-PQ's restriction of data to be of float / half type for the Cosine metric. This PR also fixes an in-place data modification that was being done by IVF-PQ. Opportunities for optimization: NN Descent to support Cosine and compute dataset norms only once -- during NN Descent. Re-use those for CAGRA. [UPDATE 08/21/2025]: NN Descent now support Cosine. This PR allows the initial CAGRA graph to be built by both methods -- IVF_PQ, NN_DESCENT. The IVF_PQ restriction on data types holds, but uint8 and int8 can be supported with NN Descent as the graph building algorithm. ITERATIVE CAGRA SEARCH is currently disabled for Cosine. [UPDATE 09/23/2025]: This PR also adds Cosine support for IVF_PQ with uint8 / int8 inputs. The above mentioned restriction with IVF_PQ has been removed. So with this PR CAGRA supports Cosine wholly, for float, uint8 and int8 inputs. ITERATIVE_SEARCH however still has some issues as the graph building method with the Cosine metric and has been disabled. [UPDATE 09/25/2025]: Binary size comparison for libcuvs.so (CUDA 12.9, x86): branch-25.10: 1154.42 MB This PR: 1160.73 MB Total CAGRA testing time: branch-25.10: ``` Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 825.43 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.58 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 663.97 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 397.57 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 408.16 sec ``` This PR: ``` Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 1830.34 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.45 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 1444.14 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 973.64 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 1010.46 sec ``` [UPDATE 09/30/2025]: Updates to CAGRA C++ tests according to the latest PR reviews. New total CAGRA testing time: branch-25.10: ``` Start 9: NEIGHBORS_ANN_CAGRA_TEST_BUGS 18/37 Test rapidsai#9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ........... Passed 16.99 sec Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 803.64 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.49 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 667.89 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 420.49 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 429.57 sec ``` This PR: ``` Start 9: NEIGHBORS_ANN_CAGRA_TEST_BUGS 18/37 Test rapidsai#9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ........... Passed 26.62 sec Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 973.23 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.43 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 702.02 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 491.65 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 541.43 sec ``` Fixes rapidsai#1288 Fixes rapidsai#389 Authors: - Tarang Jain (https://github.com/tarang-jain) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#197
Currently only IVF-PQ can be used as the graph building algorithm (NN Descent does not support Cosine). As a result, we are limited by IVF-PQ's restriction of data to be of float / half type for the Cosine metric. This PR also fixes an in-place data modification that was being done by IVF-PQ. Opportunities for optimization: NN Descent to support Cosine and compute dataset norms only once -- during NN Descent. Re-use those for CAGRA. [UPDATE 08/21/2025]: NN Descent now support Cosine. This PR allows the initial CAGRA graph to be built by both methods -- IVF_PQ, NN_DESCENT. The IVF_PQ restriction on data types holds, but uint8 and int8 can be supported with NN Descent as the graph building algorithm. ITERATIVE CAGRA SEARCH is currently disabled for Cosine. [UPDATE 09/23/2025]: This PR also adds Cosine support for IVF_PQ with uint8 / int8 inputs. The above mentioned restriction with IVF_PQ has been removed. So with this PR CAGRA supports Cosine wholly, for float, uint8 and int8 inputs. ITERATIVE_SEARCH however still has some issues as the graph building method with the Cosine metric and has been disabled. [UPDATE 09/25/2025]: Binary size comparison for libcuvs.so (CUDA 12.9, x86): branch-25.10: 1154.42 MB This PR: 1160.73 MB Total CAGRA testing time: branch-25.10: ``` Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 825.43 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.58 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 663.97 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 397.57 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 408.16 sec ``` This PR: ``` Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 1830.34 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.45 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 1444.14 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 973.64 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 1010.46 sec ``` [UPDATE 09/30/2025]: Updates to CAGRA C++ tests according to the latest PR reviews. New total CAGRA testing time: branch-25.10: ``` Start 9: NEIGHBORS_ANN_CAGRA_TEST_BUGS 18/37 Test rapidsai#9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ........... Passed 16.99 sec Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 803.64 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.49 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 667.89 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 420.49 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 429.57 sec ``` This PR: ``` Start 9: NEIGHBORS_ANN_CAGRA_TEST_BUGS 18/37 Test rapidsai#9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ........... Passed 26.62 sec Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 973.23 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.43 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 702.02 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 491.65 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 541.43 sec ``` Fixes rapidsai#1288 Fixes rapidsai#389 Authors: - Tarang Jain (https://github.com/tarang-jain) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#197
Currently only IVF-PQ can be used as the graph building algorithm (NN Descent does not support Cosine). As a result, we are limited by IVF-PQ's restriction of data to be of float / half type for the Cosine metric. This PR also fixes an in-place data modification that was being done by IVF-PQ. Opportunities for optimization: NN Descent to support Cosine and compute dataset norms only once -- during NN Descent. Re-use those for CAGRA. [UPDATE 08/21/2025]: NN Descent now support Cosine. This PR allows the initial CAGRA graph to be built by both methods -- IVF_PQ, NN_DESCENT. The IVF_PQ restriction on data types holds, but uint8 and int8 can be supported with NN Descent as the graph building algorithm. ITERATIVE CAGRA SEARCH is currently disabled for Cosine. [UPDATE 09/23/2025]: This PR also adds Cosine support for IVF_PQ with uint8 / int8 inputs. The above mentioned restriction with IVF_PQ has been removed. So with this PR CAGRA supports Cosine wholly, for float, uint8 and int8 inputs. ITERATIVE_SEARCH however still has some issues as the graph building method with the Cosine metric and has been disabled. [UPDATE 09/25/2025]: Binary size comparison for libcuvs.so (CUDA 12.9, x86): branch-25.10: 1154.42 MB This PR: 1160.73 MB Total CAGRA testing time: branch-25.10: ``` Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 825.43 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.58 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 663.97 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 397.57 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 408.16 sec ``` This PR: ``` Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 1830.34 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.45 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 1444.14 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 973.64 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 1010.46 sec ``` [UPDATE 09/30/2025]: Updates to CAGRA C++ tests according to the latest PR reviews. New total CAGRA testing time: branch-25.10: ``` Start 9: NEIGHBORS_ANN_CAGRA_TEST_BUGS 18/37 Test rapidsai#9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ........... Passed 16.99 sec Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 803.64 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.49 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 667.89 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 420.49 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 429.57 sec ``` This PR: ``` Start 9: NEIGHBORS_ANN_CAGRA_TEST_BUGS 18/37 Test rapidsai#9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ........... Passed 26.62 sec Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 973.23 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.43 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 702.02 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 491.65 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 541.43 sec ``` Fixes rapidsai#1288 Fixes rapidsai#389 Authors: - Tarang Jain (https://github.com/tarang-jain) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#197
Currently only IVF-PQ can be used as the graph building algorithm (NN Descent does not support Cosine). As a result, we are limited by IVF-PQ's restriction of data to be of float / half type for the Cosine metric. This PR also fixes an in-place data modification that was being done by IVF-PQ.
Opportunities for optimization:
NN Descent to support Cosine and compute dataset norms only once -- during NN Descent. Re-use those for CAGRA.
[UPDATE 08/21/2025]: NN Descent now support Cosine. This PR allows the initial CAGRA graph to be built by both methods -- IVF_PQ, NN_DESCENT. The IVF_PQ restriction on data types holds, but uint8 and int8 can be supported with NN Descent as the graph building algorithm. ITERATIVE CAGRA SEARCH is currently disabled for Cosine.
[UPDATE 09/23/2025]: This PR also adds Cosine support for IVF_PQ with uint8 / int8 inputs. The above mentioned restriction with IVF_PQ has been removed. So with this PR CAGRA supports Cosine wholly, for float, uint8 and int8 inputs. ITERATIVE_SEARCH however still has some issues as the graph building method with the Cosine metric and has been disabled.
[UPDATE 09/25/2025]:
Binary size comparison for libcuvs.so (CUDA 12.9, x86):
branch-25.10: 1154.42 MB
This PR: 1160.73 MB
Total CAGRA testing time:
branch-25.10:
This PR:
[UPDATE 09/30/2025]:
Updates to CAGRA C++ tests according to the latest PR reviews.
New total CAGRA testing time:
branch-25.10:
This PR:
Fixes #1288
Fixes #389