Skip to content

Normalize dataset vectors in the CAGRA InnerProduct tests#2287

Merged
rapids-bot[bot] merged 14 commits into
NVIDIA:branch-24.06from
enp1s0:cagra-test-normalize-vectors
May 7, 2024
Merged

Normalize dataset vectors in the CAGRA InnerProduct tests#2287
rapids-bot[bot] merged 14 commits into
NVIDIA:branch-24.06from
enp1s0:cagra-test-normalize-vectors

Conversation

@enp1s0

@enp1s0 enp1s0 commented May 2, 2024

Copy link
Copy Markdown
Contributor

This PR updates the CAGRA test to normalize the dataset and query vectors in the CAGRA test when the metric is InnerProduct. If we don't normalize them, large L2 norm dataset vectors tend to be included in the search result across all queries. This means that only a part of the graph nodes may be traversed in the search process, leading to test incompleteness.

@enp1s0 enp1s0 requested a review from a team as a code owner May 2, 2024 10:22
@enp1s0 enp1s0 self-assigned this May 2, 2024
@github-actions github-actions Bot added the cpp label May 2, 2024
@enp1s0 enp1s0 added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed cpp labels May 2, 2024
@github-actions github-actions Bot added the cpp label May 2, 2024
@enp1s0 enp1s0 mentioned this pull request May 2, 2024
2 tasks
Comment thread cpp/test/neighbors/ann_cagra.cuh Outdated
Comment thread cpp/test/neighbors/ann_cagra.cuh Outdated

@tfeher tfeher left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enp1s0 for the PR! I agree with Tarang, that we should either use existing raft utilities, or document in an issue why this is not possible.

Comment thread cpp/test/neighbors/ann_cagra.cuh Outdated

@tfeher tfeher left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enp1s0 for the updates, LGTM!

@tfeher

tfeher commented May 7, 2024

Copy link
Copy Markdown
Contributor

/merge

@rapids-bot rapids-bot Bot merged commit 97e38eb into NVIDIA:branch-24.06 May 7, 2024
abc99lr pushed a commit to abc99lr/raft that referenced this pull request May 10, 2024
This PR updates the CAGRA test to normalize the dataset and query vectors in the CAGRA test when the metric is InnerProduct. If we don't normalize them, large L2 norm dataset vectors tend to be included in the search result across all queries. This means that only a part of the graph nodes may be traversed in the search process, leading to test incompleteness.

Authors:
  - tsuki (https://github.com/enp1s0)

Approvers:
  - Tarang Jain (https://github.com/tarang-jain)
  - Tamas Bela Feher (https://github.com/tfeher)

URL: NVIDIA#2287
loulankxh pushed a commit to loulankxh/raft that referenced this pull request Oct 14, 2025
This PR updates the CAGRA test to normalize the dataset and query vectors in the CAGRA test when the metric is InnerProduct. If we don't normalize them, large L2 norm dataset vectors tend to be included in the search result across all queries. This means that only a part of the graph nodes may be traversed in the search process, leading to test incompleteness.

Authors:
  - tsuki (https://github.com/enp1s0)

Approvers:
  - Tarang Jain (https://github.com/tarang-jain)
  - Tamas Bela Feher (https://github.com/tfeher)

URL: NVIDIA#2287
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants