Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions cpp/include/cuvs/neighbors/common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -456,6 +456,11 @@ inline constexpr bool is_vpq_dataset_v = is_vpq_dataset<DatasetT>::value;

namespace filtering {

/**
* @defgroup neighbors_filtering Filtering for ANN Types
* @{
*/

struct base_filter {
virtual ~base_filter() = default;
};
Expand Down Expand Up @@ -543,6 +548,8 @@ struct bitset_filter : public base_filter {
const uint32_t sample_ix) const;
};

/** @} */ // end group neighbors_filtering

/**
* If the filtering depends on the index of a sample, then the following
* filter template can be used:
Expand Down
1 change: 1 addition & 0 deletions docs/source/cpp_api/neighbors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Nearest Neighbors
neighbors_bruteforce.rst
neighbors_cagra.rst
neighbors_dynamic_batching.rst
neighbors_filter.rst
neighbors_hnsw.rst
neighbors_ivf_flat.rst
neighbors_ivf_pq.rst
Expand Down
18 changes: 18 additions & 0 deletions docs/source/cpp_api/neighbors_filter.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Filtering
==========

All nearest neighbors search methods support filtering. Filtering is a method to reduce the number
of candidates that are considered for the nearest neighbors search.

.. role:: py(code)
:language: c++
:class: highlight

``#include <cuvs/neighbors/common.hpp>``

namespace *cuvs::neighbors*

.. doxygengroup:: neighbors_filtering
:project: cuvs
:members:
:content-only:
75 changes: 75 additions & 0 deletions docs/source/filtering.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
.. _filtering:

~~~~~~~~~~~~~~~~~~~~~~~~
Filtering vector indexes
~~~~~~~~~~~~~~~~~~~~~~~~

CuVS supports different type of filtering depending on the vector index being used. The main method used in all of the vector indexes
Comment thread
lowener marked this conversation as resolved.
Outdated
is pre-filtering, which is a technique that will into account the filtering of the vectors before computing it's closest neighbors, saving
some computation from calculating distances.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is a supposed to be included in the getting started guide, I think we should provide a little more info to beginner users who might not be familiar with the "when" and "why". Adding some use-cases here could be helpful for users.


Bitset
======

A bitset is an array of bits where each bit can have two possible values: `0` and `1`, which signify in the context of filtering whether
a sample should be filtered or not. `0` means that the corresponding vector will be filtered, and will therefore not be present in the results of the search.
This mechanism is optimized to take as little memory space as possible, and is available through the RAFT library
(check out RAFT's `bitset API documentation <https://docs.rapids.ai/api/raft/stable/cpp_api/core_bitset/>`). When calling a search function of an ANN index, the
bitset length should match the number of vectors present in the database.

Bitmap
======

A bitmap is based on the same principle as a bitset, but in two dimensions. This allow users to give different bitset for each query that
Comment thread
lowener marked this conversation as resolved.
Outdated
is getting searched. Check out RAFT's `bitmap API documentation <https://docs.rapids.ai/api/raft/stable/cpp_api/core_bitmap/>`.
Comment thread
lowener marked this conversation as resolved.
Outdated

Using filters in cuVS
=====================

Example
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust doc makes different sized heading for each different type of heading symbol used. Please follow other docs and use different symbols for these. based on the order it traverses the heading tree, it'll figure out which ones whould be H1, H2, H3, and so on, so long as you are consistent about where they are used.

Comment thread
lowener marked this conversation as resolved.
Outdated
=======

Using a Bitmap filter on a Brute-force index
--------------------------------------------
Comment thread
lowener marked this conversation as resolved.
Outdated


Using a Bitset filter on a CAGRA index
Comment thread
lowener marked this conversation as resolved.
--------------------------------------

.. code-block:: c++

#include <cuvs/neighbors/cagra.hpp>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an example cu file in examples/cpp as well (assuming it doesn't exist already). That would also be a great thing to link to from these docs.

#include <cuvs/core/bitset.hpp>

using namespace cuvs::neighbors;
cagra::index index;

// ... build index ...

cagra::search_params search_params;
raft::device_resources res;
raft::device_matrix_view<float> queries = load_queries();
raft::device_matrix_view<uint32_t> neighbors = make_device_matrix_view<uint32_t>(n_queries, k);
raft::device_matrix_view<float> distances = make_device_matrix_view<float>(n_queries, k);

// Load a list of all the samples that will get filtered
std::vector<uint32_t> removed_indices_host = get_invalid_indices();
auto removed_indices_device =
raft::make_device_vector<uint32_t, uint32_t>(res, removed_indices_host.size());
// Copy this list to device
raft::copy(removed_indices_device.data_handle(), removed_indices_host.data(),
removed_indices_host.size(), raft::resource::get_cuda_stream(res));

// Create a bitset with the list of samples to filter.
cuvs::core::bitset<uint32_t, uint32_t> removed_indices_bitset(
res, removed_indices_device.view(), index.size());
// Use a `bitset_filter` in the `cagra::search` function call.
auto bitset_filter =
cuvs::neighbors::filtering::bitset_filter(removed_indices_bitset.view());
cagra::search(res,
search_params,
index,
queries,
neighbors,
distances,
bitset_filter_obj);
4 changes: 3 additions & 1 deletion docs/source/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -117,4 +117,6 @@ We always welcome patches for new features and bug fixes. Please read our `contr
comparing_indexes.rst
indexes/indexes.rst
api_basics.rst
api_interoperability.rst
api_interoperability.rst
working_with_ann_indexes.rst
filtering.rst
4 changes: 2 additions & 2 deletions docs/source/indexes/bruteforce.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,6 @@ Memory footprint
Index footprint
~~~~~~~~~~~~~~~

Raw vectors: :math:`n_vectors * n_dimensions * precision`
Raw vectors: :math:`n\_vectors * n\_dimensions * precision`

Vector norms (for distances which require them): :math:`n_vectors * precision`
Vector norms (for distances which require them): :math:`n\_vectors * precision`
24 changes: 14 additions & 10 deletions docs/source/indexes/cagra.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,22 +108,22 @@ IVFPQ or NN-DESCENT can be used to build the graph (additions to the peak memory
Dataset on device (graph on host):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Index memory footprint (device): :math:`n_index_vectors * n_dims * sizeof(T)`
Index memory footprint (device): :math:`n\_index\_vectors * n\_dims * sizeof(T)`

Index memory footprint (host): :math:`graph_degree * n_index_vectors * sizeof(T)``
Index memory footprint (host): :math:`graph\_degree * n\_index\_vectors * sizeof(T)``

Dataset on host (graph on host):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Index memory footprint (host): :math:`n_index_vectors * n_dims * sizeof(T) + graph_degree * n_index_vectors * sizeof(T)`
Index memory footprint (host): :math:`n\_index\_vectors * n\_dims * sizeof(T) + graph\_degree * n\_index\_vectors * sizeof(T)`

Build peak memory usage:
~~~~~~~~~~~~~~~~~~~~~~~~

When built using NN-descent / IVF-PQ, the build process consists of two phases: (1) building an initial/(intermediate) graph and then (2) optimizing the graph. Key input parameters are n_vectors, intermediate_graph_degree, graph_degree.
The memory usage in the first phase (building) depends on the chosen method. The biggest allocation is the graph (n_vectors*intermediate_graph_degree), but it’s stored in the host memory.
Usually, the second phase (optimize) uses the most device memory. The peak memory usage is achieved during the pruning step (graph_core.cuh/optimize)
Optimize: formula for peak memory usage (device): :math:`n_vectors * (4 + (sizeof(IdxT) + 1) * intermediate_degree)``
Optimize: formula for peak memory usage (device): :math:`n\_vectors * (4 + (sizeof(IdxT) + 1) * intermediate_degree)``

Build with out-of-core IVF-PQ peak memory usage:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -134,16 +134,20 @@ IVF-PQ Build:

.. math::

n_vectors / train_set_ratio * dim * sizeof(float) // trainset, may be in managed mem
+ n_vectors / train_set_ratio * sizeof(uint32_t) // labels, may be in managed mem
+ n_clusters * n_dim * sizeof(float) // cluster centers
n\_vectors / train\_set\_ratio * dim * sizeof(float) // trainset, may be in managed mem

+ n\_vectors / train\_set\_ratio * sizeof(uint32_t) // labels, may be in managed mem
Comment thread
lowener marked this conversation as resolved.

+ n\_clusters * n\_dim * sizeof(float) // cluster centers

IVF-PQ Search (max batch size 1024 vectors on device at a time):

.. math::

[n_vectors * (pq_dim * pq_bits / 8 + sizeof(int64_t)) + O(n_clusters)]
+ [batch_size * n_dim * sizeof(float)] + [batch_size * intermediate_degree * sizeof(uint32_t)] +
[batch_size * intermediate_degree * sizeof(float)]
[n\_vectors * (pq\_dim * pq\_bits / 8 + sizeof(int64\_t)) + O(n\_clusters)]

+ [batch\_size * n\_dim * sizeof(float)] + [batch\_size * intermediate\_degree * sizeof(uint32\_t)]

+ [batch\_size * intermediate\_degree * sizeof(float)]


4 changes: 2 additions & 2 deletions docs/source/indexes/ivfflat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ Memory footprint
----------------

Each cluster is padded to at least 32 vectors (but potentially up to 1024). Assuming uniform random distribution of vectors/list, we would have
:math:`cluster\_overhead = (conservative\_memory\_allocation ? 16 : 512 ) * dim * sizeof_{float})`
Copy link
Copy Markdown
Member

@cjnolet cjnolet Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional here- we want the float to be a subscript for ease of readership (it aligns better swith the reast of the formulas since we are using latex all over the place already).

:math:`cluster\_overhead = (conservative\_memory\_allocation ? 16 : 512 ) * dim * sizeof(float))`

Note that each cluster is allocated as a separate allocation. If we use a `cuda_memory_resource`, that would grab memory in 1 MiB chunks, so on average we might have 0.5 MiB overhead per cluster. If we us 10s of thousands of clusters, it becomes essential to use pool allocator to avoid this overhead.

Expand All @@ -110,7 +110,7 @@ Index (device memory):
Peak device memory usage for index build:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:math:`workspace = min(1GB, n\_queries * [(n\_lists + 1 + n\_probes * (k + 1)) * sizeof_{float}) + n\_probes * k * sizeof_{idx}])`
:math:`workspace = min(1GB, n\_queries * [(n\_lists + 1 + n\_probes * (k + 1)) * sizeof(float)) + n\_probes * k * sizeof(idx)])`

:math:`index\_size + workspace`

16 changes: 8 additions & 8 deletions docs/source/indexes/ivfpq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,22 +97,22 @@ Simple approximate formula: :math:`n\_vectors * (pq\_dim * \frac{pq\_bits}{8} +

The IVF lists end up being represented by a sparse data structure that stores the pointers to each list, an indices array that contains the indexes of each vector in each list, and an array with the encoded (and interleaved) data for each list.

IVF list pointers: :math:`n\_clusters * sizeof_{uint32_t}`
IVF list pointers: :math:`n\_clusters * sizeof(uint32_t)`

Indices: :math:`n\_vectors * sizeof_{idx}``
Indices: :math:`n\_vectors * sizeof(idx)`

Encoded data (interleaved): :math:`n\_vectors * pq\_dim * \frac{pq\_bits}{8}`

Per subspace method: :math:`4 * pq\_dim * pq\_len * 2^pq\_bits`
Per subspace method: :math:`4 * pq\_dim * pq\_len * 2^{pq\_bits}`

Per cluster method: :math:`4 * n\_clusters * pq\_len * 2^pq\_bits`
Per cluster method: :math:`4 * n\_clusters * pq\_len * 2^{pq\_bits}`

Extras: :math:`n\_clusters * (20 + 8 * dim)`

Index (host memory):
~~~~~~~~~~~~~~~~~~~~

When refinement is used with the dataset on host, the original raw vectors are needed: :math:`n\_vectors * dims * sizeof_{Tloat}`
When refinement is used with the dataset on host, the original raw vectors are needed: :math:`n\_vectors * dims * sizeof(float)`

Search peak memory usage (device);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -126,11 +126,11 @@ Build peak memory usage (device):

.. math::

\frac{n\_vectors}{trainset\_ratio * dims * sizeof_{float}}
\frac{n\_vectors}{trainset\_ratio * dims * sizeof(float)}

+ \frac{n\_vectors}{trainset\_ratio * sizeof_{uint32_t}}
+ \frac{n\_vectors}{trainset\_ratio * sizeof(uint32_t)}

+ n\_clusters * dim * sizeof_{float}
+ n\_clusters * dim * sizeof(float)

Note, if there’s not enough space left in the workspace memory resource, IVF-PQ build automatically switches to the managed memory for the training set and labels.

Expand Down