Skip to content

CosineExpanded Distance Metric for CAGRA#197

Merged
rapids-bot[bot] merged 113 commits intorapidsai:branch-25.12from
tarang-jain:cagra-dist-metric
Oct 3, 2025
Merged

CosineExpanded Distance Metric for CAGRA#197
rapids-bot[bot] merged 113 commits intorapidsai:branch-25.12from
tarang-jain:cagra-dist-metric

Conversation

@tarang-jain
Copy link
Copy Markdown
Contributor

@tarang-jain tarang-jain commented Jun 25, 2024

Currently only IVF-PQ can be used as the graph building algorithm (NN Descent does not support Cosine). As a result, we are limited by IVF-PQ's restriction of data to be of float / half type for the Cosine metric. This PR also fixes an in-place data modification that was being done by IVF-PQ.
Opportunities for optimization:
NN Descent to support Cosine and compute dataset norms only once -- during NN Descent. Re-use those for CAGRA.

[UPDATE 08/21/2025]: NN Descent now support Cosine. This PR allows the initial CAGRA graph to be built by both methods -- IVF_PQ, NN_DESCENT. The IVF_PQ restriction on data types holds, but uint8 and int8 can be supported with NN Descent as the graph building algorithm. ITERATIVE CAGRA SEARCH is currently disabled for Cosine.

[UPDATE 09/23/2025]: This PR also adds Cosine support for IVF_PQ with uint8 / int8 inputs. The above mentioned restriction with IVF_PQ has been removed. So with this PR CAGRA supports Cosine wholly, for float, uint8 and int8 inputs. ITERATIVE_SEARCH however still has some issues as the graph building method with the Cosine metric and has been disabled.

[UPDATE 09/25/2025]:
Binary size comparison for libcuvs.so (CUDA 12.9, x86):
branch-25.10: 1154.42 MB
This PR: 1160.73 MB

Total CAGRA testing time:
branch-25.10:

Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test #10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  825.43 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test #11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.58 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test #12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  663.97 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test #13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  397.57 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test #14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  408.16 sec

This PR:

Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test #10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  1830.34 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test #11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.45 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test #12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  1444.14 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test #13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  973.64 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test #14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  1010.46 sec

[UPDATE 09/30/2025]:
Updates to CAGRA C++ tests according to the latest PR reviews.
New total CAGRA testing time:
branch-25.10:

      Start  9: NEIGHBORS_ANN_CAGRA_TEST_BUGS
18/37 Test  #9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ...........   Passed   16.99 sec
      Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test #10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  803.64 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test #11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.49 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test #12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  667.89 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test #13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  420.49 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test #14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  429.57 sec

This PR:

      Start  9: NEIGHBORS_ANN_CAGRA_TEST_BUGS
18/37 Test  #9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ...........   Passed   26.62 sec
      Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test #10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  973.23 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test #11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.43 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test #12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  702.02 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test #13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  491.65 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test #14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  541.43 sec

Fixes #1288
Fixes #389

@github-actions github-actions Bot added the cpp label Jun 25, 2024
@cjnolet cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jun 26, 2024
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tarang-jain, a heads up here: #296 does a major refactoring of related code; let's have a look together how we can proceed with this PR once you're back to it, ok?
I have similar performance concerns as the ones we discussed on IVF-PQ; maybe it makes sense to keep the dataset normalized for cosine distance (and reuse the inner-product code path)?
Then we can either normalize the query at the time we copy it to the shared memory (pre-processing) or divide by the query norm at the post-processing/filtering step at the end of the kernel.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this PR to do the divide by query norm at the very end (postprocessing stage)-

@tarang-jain tarang-jain changed the base branch from branch-24.08 to branch-24.10 October 2, 2024 19:42
@tarang-jain tarang-jain changed the base branch from branch-24.10 to branch-24.12 October 17, 2024 16:38
@tarang-jain tarang-jain changed the base branch from branch-24.12 to branch-25.10 August 11, 2025 19:57
@pmiloslavsky
Copy link
Copy Markdown

Has this work been abandoned?

@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Aug 13, 2025

@pmiloslavsky this PR is currently being reworked to do the computation in place, instead of normalizing the vectors front. The normalization trick works well when users have control over the input vectors, but in the case of CAGRA, the dataset needs to be stored with the graph, and we try not to alter the user’s input on their behalf when we can’t give it right back to them.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Aug 13, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Tarang, thank you for the updates! As discussed offline, the test size increase is still substantial. I have a few suggestions how to cut that down.

Comment thread cpp/tests/neighbors/ann_cagra.cuh Outdated
Comment thread cpp/tests/neighbors/ann_cagra.cuh Outdated
Comment thread cpp/tests/neighbors/ann_cagra.cuh
Comment thread cpp/tests/neighbors/ann_cagra.cuh Outdated
Comment thread cpp/tests/neighbors/ann_cagra.cuh Outdated
@tarang-jain
Copy link
Copy Markdown
Contributor Author

/ok to test a6c6592

Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tarang-jain for improving the test time. LGTM.

@tarang-jain tarang-jain removed the request for review from a team October 1, 2025 19:27
@tarang-jain
Copy link
Copy Markdown
Contributor Author

/rerun tests

@tarang-jain
Copy link
Copy Markdown
Contributor Author

/rerun failed tests

@tarang-jain tarang-jain changed the base branch from branch-25.10 to branch-25.12 October 2, 2025 16:59
@tarang-jain tarang-jain removed the request for review from KyleFromNVIDIA October 2, 2025 17:54
@tarang-jain
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot Bot merged commit 88f7f23 into rapidsai:branch-25.12 Oct 3, 2025
162 of 164 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Unstructured Data Processing Oct 3, 2025
julianmi pushed a commit to julianmi/cuvs that referenced this pull request Oct 6, 2025
Currently only IVF-PQ can be used as the graph building algorithm (NN Descent does not support Cosine). As a result, we are limited by IVF-PQ's restriction of data to be of float / half type for the Cosine metric. This PR also fixes an in-place data modification that was being done by IVF-PQ.
Opportunities for optimization:
NN Descent to support Cosine and compute dataset norms only once -- during NN Descent. Re-use those for CAGRA.

[UPDATE 08/21/2025]: NN Descent now support Cosine. This PR allows the initial CAGRA graph to be built by both methods -- IVF_PQ, NN_DESCENT. The IVF_PQ restriction on data types holds, but uint8 and int8 can be supported with NN Descent as the graph building algorithm. ITERATIVE CAGRA SEARCH is currently disabled for Cosine.

[UPDATE 09/23/2025]: This PR also adds Cosine support for IVF_PQ with uint8 / int8 inputs. The above mentioned restriction with IVF_PQ has been removed. So with this PR CAGRA supports Cosine wholly, for float, uint8 and int8 inputs. ITERATIVE_SEARCH however still has some issues as the graph building method with the Cosine metric and has been disabled.

[UPDATE 09/25/2025]:
Binary size comparison for libcuvs.so (CUDA 12.9, x86):
branch-25.10: 1154.42 MB
This PR: 1160.73 MB

Total CAGRA testing time:
branch-25.10:
```
Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  825.43 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.58 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  663.97 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  397.57 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  408.16 sec
```
This PR:
```
Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  1830.34 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.45 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  1444.14 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  973.64 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  1010.46 sec
```

[UPDATE 09/30/2025]:
Updates to CAGRA C++ tests according to the latest PR reviews.
New total CAGRA testing time:
branch-25.10:
```
      Start  9: NEIGHBORS_ANN_CAGRA_TEST_BUGS
18/37 Test  rapidsai#9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ...........   Passed   16.99 sec
      Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  803.64 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.49 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  667.89 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  420.49 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  429.57 sec
```
This PR:
```
      Start  9: NEIGHBORS_ANN_CAGRA_TEST_BUGS
18/37 Test  rapidsai#9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ...........   Passed   26.62 sec
      Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  973.23 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.43 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  702.02 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  491.65 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  541.43 sec
```
Fixes rapidsai#1288
Fixes rapidsai#389

Authors:
  - Tarang Jain (https://github.com/tarang-jain)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#197
rmaschal pushed a commit to rmaschal/cuvs that referenced this pull request Oct 6, 2025
Currently only IVF-PQ can be used as the graph building algorithm (NN Descent does not support Cosine). As a result, we are limited by IVF-PQ's restriction of data to be of float / half type for the Cosine metric. This PR also fixes an in-place data modification that was being done by IVF-PQ.
Opportunities for optimization:
NN Descent to support Cosine and compute dataset norms only once -- during NN Descent. Re-use those for CAGRA.

[UPDATE 08/21/2025]: NN Descent now support Cosine. This PR allows the initial CAGRA graph to be built by both methods -- IVF_PQ, NN_DESCENT. The IVF_PQ restriction on data types holds, but uint8 and int8 can be supported with NN Descent as the graph building algorithm. ITERATIVE CAGRA SEARCH is currently disabled for Cosine.

[UPDATE 09/23/2025]: This PR also adds Cosine support for IVF_PQ with uint8 / int8 inputs. The above mentioned restriction with IVF_PQ has been removed. So with this PR CAGRA supports Cosine wholly, for float, uint8 and int8 inputs. ITERATIVE_SEARCH however still has some issues as the graph building method with the Cosine metric and has been disabled.

[UPDATE 09/25/2025]:
Binary size comparison for libcuvs.so (CUDA 12.9, x86):
branch-25.10: 1154.42 MB
This PR: 1160.73 MB

Total CAGRA testing time:
branch-25.10:
```
Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  825.43 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.58 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  663.97 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  397.57 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  408.16 sec
```
This PR:
```
Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  1830.34 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.45 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  1444.14 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  973.64 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  1010.46 sec
```

[UPDATE 09/30/2025]:
Updates to CAGRA C++ tests according to the latest PR reviews.
New total CAGRA testing time:
branch-25.10:
```
      Start  9: NEIGHBORS_ANN_CAGRA_TEST_BUGS
18/37 Test  rapidsai#9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ...........   Passed   16.99 sec
      Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  803.64 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.49 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  667.89 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  420.49 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  429.57 sec
```
This PR:
```
      Start  9: NEIGHBORS_ANN_CAGRA_TEST_BUGS
18/37 Test  rapidsai#9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ...........   Passed   26.62 sec
      Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  973.23 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.43 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  702.02 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  491.65 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  541.43 sec
```
Fixes rapidsai#1288
Fixes rapidsai#389

Authors:
  - Tarang Jain (https://github.com/tarang-jain)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#197
enp1s0 added a commit to enp1s0/cuvs that referenced this pull request Oct 22, 2025
Currently only IVF-PQ can be used as the graph building algorithm (NN Descent does not support Cosine). As a result, we are limited by IVF-PQ's restriction of data to be of float / half type for the Cosine metric. This PR also fixes an in-place data modification that was being done by IVF-PQ.
Opportunities for optimization:
NN Descent to support Cosine and compute dataset norms only once -- during NN Descent. Re-use those for CAGRA.

[UPDATE 08/21/2025]: NN Descent now support Cosine. This PR allows the initial CAGRA graph to be built by both methods -- IVF_PQ, NN_DESCENT. The IVF_PQ restriction on data types holds, but uint8 and int8 can be supported with NN Descent as the graph building algorithm. ITERATIVE CAGRA SEARCH is currently disabled for Cosine.

[UPDATE 09/23/2025]: This PR also adds Cosine support for IVF_PQ with uint8 / int8 inputs. The above mentioned restriction with IVF_PQ has been removed. So with this PR CAGRA supports Cosine wholly, for float, uint8 and int8 inputs. ITERATIVE_SEARCH however still has some issues as the graph building method with the Cosine metric and has been disabled.

[UPDATE 09/25/2025]:
Binary size comparison for libcuvs.so (CUDA 12.9, x86):
branch-25.10: 1154.42 MB
This PR: 1160.73 MB

Total CAGRA testing time:
branch-25.10:
```
Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  825.43 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.58 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  663.97 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  397.57 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  408.16 sec
```
This PR:
```
Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  1830.34 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.45 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  1444.14 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  973.64 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  1010.46 sec
```

[UPDATE 09/30/2025]:
Updates to CAGRA C++ tests according to the latest PR reviews.
New total CAGRA testing time:
branch-25.10:
```
      Start  9: NEIGHBORS_ANN_CAGRA_TEST_BUGS
18/37 Test  rapidsai#9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ...........   Passed   16.99 sec
      Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  803.64 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.49 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  667.89 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  420.49 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  429.57 sec
```
This PR:
```
      Start  9: NEIGHBORS_ANN_CAGRA_TEST_BUGS
18/37 Test  rapidsai#9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ...........   Passed   26.62 sec
      Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST
19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ...   Passed  973.23 sec
      Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST
20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........   Passed    0.43 sec
      Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST
21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST ....   Passed  702.02 sec
      Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST
22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST ....   Passed  491.65 sec
      Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST
23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ...   Passed  541.43 sec
```
Fixes rapidsai#1288
Fixes rapidsai#389

Authors:
  - Tarang Jain (https://github.com/tarang-jain)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#197
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[FEA] CAGRA to support cosine distance [FEA] Support int8 and uint8 vector types for IVF-PQ with Cosine metric

6 participants