rapidsai · rapids-bot · Jan 31, 2025 · Jan 13, 2025 · Jan 13, 2025 · Jan 29, 2025
@@ -456,6 +456,11 @@ inline constexpr bool is_vpq_dataset_v = is_vpq_dataset<DatasetT>::value;
 
 namespace filtering {
 
+/**
+ * @defgroup neighbors_filtering Filtering for ANN Types
+ * @{
+ */
+
 struct base_filter {
   virtual ~base_filter() = default;
 };
@@ -543,6 +548,8 @@ struct bitset_filter : public base_filter {
     const uint32_t sample_ix) const;
 };
 
+/** @} */  // end group neighbors_filtering
+
 /**
  * If the filtering depends on the index of a sample, then the following
  * filter template can be used:

@@ -12,6 +12,7 @@ Nearest Neighbors
    neighbors_bruteforce.rst
    neighbors_cagra.rst
    neighbors_dynamic_batching.rst
+   neighbors_filter.rst
    neighbors_hnsw.rst
    neighbors_ivf_flat.rst
    neighbors_ivf_pq.rst

@@ -0,0 +1,18 @@
+Filtering
+==========
+
+All nearest neighbors search methods support filtering. Filtering is a method to reduce the number
+of candidates that are considered for the nearest neighbors search.
+
+.. role:: py(code)
+   :language: c++
+   :class: highlight
+
+``#include <cuvs/neighbors/common.hpp>``
+
+namespace *cuvs::neighbors*
+
+.. doxygengroup:: neighbors_filtering
+    :project: cuvs
+    :members:
+    :content-only:
@@ -0,0 +1,75 @@
+.. _filtering:
+
+~~~~~~~~~~~~~~~~~~~~~~~~
+Filtering vector indexes
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+CuVS supports different type of filtering depending on the vector index being used. The main method used in all of the vector indexes
+is pre-filtering, which is a technique that will into account the filtering of the vectors before computing it's closest neighbors, saving
+some computation from calculating distances.
+
+Bitset
+======
+
+A bitset is an array of bits where each bit can have two possible values: `0` and `1`, which signify in the context of filtering whether
+a sample should be filtered or not. `0` means that the corresponding vector will be filtered, and will therefore not be present in the results of the search.
+This mechanism is optimized to take as little memory space as possible, and is available through the RAFT library
+(check out RAFT's `bitset API documentation <https://docs.rapids.ai/api/raft/stable/cpp_api/core_bitset/>`). When calling a search function of an ANN index, the
+bitset length should match the number of vectors present in the database.
+
+Bitmap
+======
+
+A bitmap is based on the same principle as a bitset, but in two dimensions. This allow users to give different bitset for each query that
+is getting searched. Check out RAFT's `bitmap API documentation <https://docs.rapids.ai/api/raft/stable/cpp_api/core_bitmap/>`.
+
+Using filters in cuVS
+=====================
+
+Example
+=======
+
+Using a Bitmap filter on a Brute-force index
+--------------------------------------------
+
+
+Using a Bitset filter on a CAGRA index
+--------------------------------------
+
+.. code-block:: c++
+
+    #include <cuvs/neighbors/cagra.hpp>
+    #include <cuvs/core/bitset.hpp>
+
+    using namespace cuvs::neighbors;
+    cagra::index index;
+
+    // ... build index ...
+
+    cagra::search_params search_params;
+    raft::device_resources res;
+    raft::device_matrix_view<float> queries = load_queries();
+    raft::device_matrix_view<uint32_t> neighbors = make_device_matrix_view<uint32_t>(n_queries, k);
+    raft::device_matrix_view<float> distances = make_device_matrix_view<float>(n_queries, k);
+
+    // Load a list of all the samples that will get filtered
+    std::vector<uint32_t> removed_indices_host = get_invalid_indices();
+    auto removed_indices_device =
+          raft::make_device_vector<uint32_t, uint32_t>(res, removed_indices_host.size());
+    // Copy this list to device
+    raft::copy(removed_indices_device.data_handle(), removed_indices_host.data(),
+               removed_indices_host.size(), raft::resource::get_cuda_stream(res));
+
+    // Create a bitset with the list of samples to filter.
+    cuvs::core::bitset<uint32_t, uint32_t> removed_indices_bitset(
+        res, removed_indices_device.view(), index.size());
+    // Use a `bitset_filter` in the `cagra::search` function call.
+    auto bitset_filter =
+          cuvs::neighbors::filtering::bitset_filter(removed_indices_bitset.view());
+    cagra::search(res,
+                  search_params,
+                  index,
+                  queries,
+                  neighbors,
+                  distances,
+                  bitset_filter_obj);
@@ -117,4 +117,6 @@ We always welcome patches for new features and bug fixes. Please read our `contr
    comparing_indexes.rst
    indexes/indexes.rst
    api_basics.rst
-   api_interoperability.rst
+   api_interoperability.rst
+   working_with_ann_indexes.rst
+   filtering.rst
@@ -57,6 +57,6 @@ Memory footprint
 Index footprint
 ~~~~~~~~~~~~~~~
 
-Raw vectors: :math:`n_vectors * n_dimensions * precision`
+Raw vectors: :math:`n\_vectors * n\_dimensions * precision`
 
-Vector norms (for distances which require them): :math:`n_vectors * precision`
+Vector norms (for distances which require them): :math:`n\_vectors * precision`
@@ -108,22 +108,22 @@ IVFPQ or NN-DESCENT can be used to build the graph (additions to the peak memory
 Dataset on device (graph on host):
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Index memory footprint (device): :math:`n_index_vectors * n_dims * sizeof(T)`
+Index memory footprint (device): :math:`n\_index\_vectors * n\_dims * sizeof(T)`
 
-Index memory footprint (host): :math:`graph_degree * n_index_vectors * sizeof(T)``
+Index memory footprint (host): :math:`graph\_degree * n\_index\_vectors * sizeof(T)``
 
 Dataset on host (graph on host):
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Index memory footprint (host): :math:`n_index_vectors * n_dims * sizeof(T) + graph_degree * n_index_vectors * sizeof(T)`
+Index memory footprint (host): :math:`n\_index\_vectors * n\_dims * sizeof(T) + graph\_degree * n\_index\_vectors * sizeof(T)`
 
 Build peak memory usage:
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
 When built using NN-descent / IVF-PQ, the build process consists of two phases: (1) building an initial/(intermediate) graph and then (2) optimizing the graph. Key input parameters are n_vectors, intermediate_graph_degree, graph_degree.
 The memory usage in the first phase (building) depends on the chosen method. The biggest allocation is the graph (n_vectors*intermediate_graph_degree), but it’s stored in the host memory.
 Usually, the second phase (optimize) uses the most device memory. The peak memory usage is achieved during the pruning step (graph_core.cuh/optimize)
-Optimize: formula for peak memory usage (device): :math:`n_vectors * (4 + (sizeof(IdxT) + 1) * intermediate_degree)``
+Optimize: formula for peak memory usage (device): :math:`n\_vectors * (4 + (sizeof(IdxT) + 1) * intermediate_degree)``
 
 Build with out-of-core IVF-PQ peak memory usage:
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -134,16 +134,20 @@ IVF-PQ Build:
 
 .. math::
 
-   n_vectors / train_set_ratio * dim * sizeof(float)   // trainset, may be in managed mem
-   + n_vectors / train_set_ratio * sizeof(uint32_t)    // labels, may be in managed mem
-   + n_clusters * n_dim * sizeof(float)                // cluster centers
+   n\_vectors / train\_set\_ratio * dim * sizeof(float)   // trainset, may be in managed mem
+
+   + n\_vectors / train\_set\_ratio * sizeof(uint32_t)    // labels, may be in managed mem
+
+   + n\_clusters * n\_dim * sizeof(float)                // cluster centers
 
 IVF-PQ Search (max batch size 1024 vectors on device at a time):
 
 .. math::
 
-   [n_vectors * (pq_dim * pq_bits / 8 + sizeof(int64_t)) + O(n_clusters)]
-   + [batch_size * n_dim * sizeof(float)] + [batch_size * intermediate_degree * sizeof(uint32_t)] +
-   [batch_size * intermediate_degree * sizeof(float)]
+   [n\_vectors * (pq\_dim * pq\_bits / 8 + sizeof(int64\_t)) + O(n\_clusters)]
+
+   + [batch\_size * n\_dim * sizeof(float)] + [batch\_size * intermediate\_degree * sizeof(uint32\_t)]
+
+   + [batch\_size * intermediate\_degree * sizeof(float)]
 
 
@@ -86,7 +86,7 @@ Memory footprint
 ----------------
 
 Each cluster is padded to at least 32 vectors (but potentially up to 1024). Assuming uniform random distribution of vectors/list, we would have
-:math:`cluster\_overhead = (conservative\_memory\_allocation ? 16 : 512 ) * dim * sizeof_{float})`
+:math:`cluster\_overhead = (conservative\_memory\_allocation ? 16 : 512 ) * dim * sizeof(float))`
 
 Note that each cluster is allocated as a separate allocation. If we use a `cuda_memory_resource`, that would grab memory in 1 MiB chunks, so on average we might have 0.5 MiB overhead per cluster. If we us 10s of thousands of clusters, it becomes essential to use pool allocator to avoid this overhead.
 
@@ -110,7 +110,7 @@ Index (device memory):
 Peak device memory usage for index build:
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-:math:`workspace = min(1GB, n\_queries * [(n\_lists + 1 + n\_probes * (k + 1)) * sizeof_{float}) + n\_probes * k * sizeof_{idx}])`
+:math:`workspace = min(1GB, n\_queries * [(n\_lists + 1 + n\_probes * (k + 1)) * sizeof(float)) + n\_probes * k * sizeof(idx)])`
 
 :math:`index\_size + workspace`
 
@@ -97,22 +97,22 @@ Simple approximate formula: :math:`n\_vectors * (pq\_dim * \frac{pq\_bits}{8} +
 
 The IVF lists end up being represented by a sparse data structure that stores the pointers to each list, an indices array that contains the indexes of each vector in each list, and an array with the encoded (and interleaved) data for each list.
 
-IVF list pointers: :math:`n\_clusters * sizeof_{uint32_t}`
+IVF list pointers: :math:`n\_clusters * sizeof(uint32_t)`
 
-Indices: :math:`n\_vectors * sizeof_{idx}``
+Indices: :math:`n\_vectors * sizeof(idx)`
 
 Encoded data (interleaved): :math:`n\_vectors * pq\_dim * \frac{pq\_bits}{8}`
 
-Per subspace method: :math:`4 * pq\_dim * pq\_len * 2^pq\_bits`
+Per subspace method: :math:`4 * pq\_dim * pq\_len * 2^{pq\_bits}`
 
-Per cluster method: :math:`4 * n\_clusters * pq\_len * 2^pq\_bits`
+Per cluster method: :math:`4 * n\_clusters * pq\_len * 2^{pq\_bits}`
 
 Extras: :math:`n\_clusters * (20 + 8 * dim)`
 
 Index (host memory):
 ~~~~~~~~~~~~~~~~~~~~
 
-When refinement is used with the dataset on host, the original raw vectors are needed: :math:`n\_vectors * dims * sizeof_{Tloat}`
+When refinement is used with the dataset on host, the original raw vectors are needed: :math:`n\_vectors * dims * sizeof(float)`
 
 Search peak memory usage (device);
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -126,11 +126,11 @@ Build peak memory usage (device):
 
 .. math::
 
-   \frac{n\_vectors}{trainset\_ratio * dims * sizeof_{float}}
+   \frac{n\_vectors}{trainset\_ratio * dims * sizeof(float)}
 
-   + \frac{n\_vectors}{trainset\_ratio * sizeof_{uint32_t}}
+   + \frac{n\_vectors}{trainset\_ratio * sizeof(uint32_t)}
 
-   + n\_clusters * dim * sizeof_{float}
+   + n\_clusters * dim * sizeof(float)
 
 Note, if there’s not enough space left in the workspace memory resource, IVF-PQ build automatically switches to the managed memory for the training set and labels.