Skip to content

[BUG] NN Descent has low recall for large datasets #204

@jinsolp

Description

@jinsolp

Description

NN Descent shows low recall (compared to using brute force knn) for large datasets. This makes it difficult to scale up and out and use NN Descent for building knn graphs for other algorithms such as UMAP.

Result for dataset with 1000 features
The recall does not improve after a certain point even with a lot of iterations
Screenshot 2024-06-27 at 1 18 27 PM

Result for dataset with 100 features
Not as bad as the dataset above with 1000 features, but also shows low recall for dataset with smaller number of features too
Screenshot 2024-06-27 at 1 23 55 PM

Reproducing the bug

The experiments above were run on specifically this commit on raft's branch-24.08.

  1. Change the raft/cpp/test/neighbors/ann_nn_descent.cuh file test input to test for larger datasets like below
    Screenshot 2024-06-26 at 6 34 07 PM
    index_params.max_iterations is set to 100 in the same file, and can be changed to test for larger number of iterations.
    The test file random generates data in the setup() function.

  2. build the test

  3. Run the test
    ./cpp/build/gtests/NEIGHBORS_ANN_NN_DESCENT_TEST --gtest_filter=AnnNNDescentTest/AnnNNDescentTestF_U32*

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions