Normalize dataset vectors in the CAGRA InnerProduct tests#2287
Merged
rapids-bot[bot] merged 14 commits intoMay 7, 2024
Conversation
tarang-jain
requested changes
May 4, 2024
… into cagra-test-normalize-vectors
tarang-jain
approved these changes
May 6, 2024
Contributor
|
/merge |
abc99lr
pushed a commit
to abc99lr/raft
that referenced
this pull request
May 10, 2024
This PR updates the CAGRA test to normalize the dataset and query vectors in the CAGRA test when the metric is InnerProduct. If we don't normalize them, large L2 norm dataset vectors tend to be included in the search result across all queries. This means that only a part of the graph nodes may be traversed in the search process, leading to test incompleteness. Authors: - tsuki (https://github.com/enp1s0) Approvers: - Tarang Jain (https://github.com/tarang-jain) - Tamas Bela Feher (https://github.com/tfeher) URL: NVIDIA#2287
loulankxh
pushed a commit
to loulankxh/raft
that referenced
this pull request
Oct 14, 2025
This PR updates the CAGRA test to normalize the dataset and query vectors in the CAGRA test when the metric is InnerProduct. If we don't normalize them, large L2 norm dataset vectors tend to be included in the search result across all queries. This means that only a part of the graph nodes may be traversed in the search process, leading to test incompleteness. Authors: - tsuki (https://github.com/enp1s0) Approvers: - Tarang Jain (https://github.com/tarang-jain) - Tamas Bela Feher (https://github.com/tfeher) URL: NVIDIA#2287
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR updates the CAGRA test to normalize the dataset and query vectors in the CAGRA test when the metric is InnerProduct. If we don't normalize them, large L2 norm dataset vectors tend to be included in the search result across all queries. This means that only a part of the graph nodes may be traversed in the search process, leading to test incompleteness.