Skip to content

CAGRA binary Hamming distance support#610

Merged
rapids-bot[bot] merged 52 commits intorapidsai:branch-25.02from
enp1s0:cagra-hamming-distance
Feb 4, 2025
Merged

CAGRA binary Hamming distance support#610
rapids-bot[bot] merged 52 commits intorapidsai:branch-25.02from
enp1s0:cagra-hamming-distance

Conversation

@enp1s0
Copy link
Copy Markdown
Member

@enp1s0 enp1s0 commented Jan 24, 2025

This PR adds the binary Hamming distance support to CAGRA.

dependency: #612

TODO:

  • Add Hamming distance dist_op for CAGRA search
    • Add DistanceType::BinaryHamming (because HammingUnexpanded is not bitwise operation)
  • Support graph build
    • Want to use the iterative graph build method that uses the CAGRA search for graph build (anaruse@5a80659) because otherwise the binary Hamming distance support for IVFPQ or NN Descent is additionally required.
  • Add CI test
    • GT creation
    • Add test cases
  • Test on some benchmark datasets
  • Add preprocessing::quantize::binary

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jan 24, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@enp1s0 enp1s0 self-assigned this Jan 24, 2025
@enp1s0 enp1s0 added feature request New feature or request non-breaking Introduces a non-breaking change labels Jan 24, 2025
@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Jan 25, 2025

/ok to test

@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Feb 1, 2025

/ok to test

@enp1s0
Copy link
Copy Markdown
Member Author

enp1s0 commented Feb 1, 2025

@cjnolet In this PR, I include preprocess::binary::transform, which performs binary quantization. The current algorithm sets a bit to 1 if a corresponding element of the input vector is positive, but is it better to be able to select another algorithm for a future extension?
The current function arguments are:

void transform(raft::resources const& res,
               raft::device_matrix_view<const double, int64_t> dataset,
               raft::device_matrix_view<uint8_t, int64_t> out);

and, for example, is it better to define a param structure and change the function arguments as follows?

enum class binary_quantization_algo_t {set_bit_if_positive};
struct cuvs::cuvs::preprocessing::quantize::binary::params{
    binary_quantization_algo_t algo = binary_quantize_algo_t::set_if_positive;
};
void transform(raft::resources const& res,
               cuvs::cuvs::preprocessing::quantize::binary::params& params,
               raft::device_matrix_view<const double, int64_t> dataset,
               raft::device_matrix_view<uint8_t, int64_t> out);

cjnolet and others added 3 commits February 1, 2025 18:43
This PR is about how CAGRA's search() and optimize() can be used to iteratively create and improve graph index.

Currently, IVFPQ and NND are used to create the initial kNN graph, which is then optimized to create the CAGRA search graph. So, for example, if you want to support a new data type in CAGRA, you need to create an initial kNN graph with that data type, and IVFPQ or NND must also support that new data type. This is a bit of hassle.

This PR is one solution to that problem. With functionality of this PR, once the CAGRA search supports the new data type, it can be used to create a graph index with it.

Authors:
  - Akira Naruse (https://github.com/anaruse)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Tamas Bela Feher (https://github.com/tfeher)

URL: rapidsai#612
@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Feb 2, 2025

/ok to test

Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enp1s0 for the PR, it looks great, I just have a few smaller comments.

Comment thread cpp/include/cuvs/preprocessing/quantize/binary.hpp
Comment thread cpp/src/preprocessing/quantize/detail/binary.cuh
Comment thread cpp/tests/neighbors/ann_cagra.cuh Outdated
Comment thread cpp/include/cuvs/preprocessing/quantize/binary.hpp Outdated
@enp1s0
Copy link
Copy Markdown
Member Author

enp1s0 commented Feb 3, 2025

@tfeher Thank you for reviewing the code. I fixed it. Can you review it again?

Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enp1s0 for the updates, the PR looks good to me!

@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Feb 3, 2025

/ok to test

@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Feb 4, 2025

/merge

@rapids-bot rapids-bot Bot merged commit 058eef2 into rapidsai:branch-25.02 Feb 4, 2025
@enp1s0 enp1s0 deleted the cagra-hamming-distance branch September 29, 2025 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CMake cpp feature request New feature or request non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants