Add `NearestNeighbors.radius_neighbors_graph` by jcrist · Pull Request #7811 · rapidsai/cuml

jcrist · 2026-02-17T23:09:34Z

This adds support for NearestNeighbors.radius_neighbors_graph, built on cuvs::neighbors::ball_cover::eps_nn.

This only adds support for mode="connectivity", mode="distance" requires rapidsai/cuvs#1815 from cuvs and will be handled in a followup. Likewise for NearestNeighbors.radius_neighbors.

Part of #262, Part of #7759.

copy-pr-bot · 2026-02-17T23:09:37Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-18T20:13:04Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Radius-based neighbor graph added (CSR connectivity for points within a radius), including a GPU-backed pathway and two-pass compute for large results.
- NearestNeighbors now exposes and persists a radius parameter and supports radius queries via ball-cover indexing, with relaxed dimensionality limits for indexing.
Tests
- Added tests covering radius graph correctness and error handling (sparse inputs, invalid radius, unsupported metrics).

Walkthrough

Adds GPU radius-based neighbor-graph support: new C++/CUDA API rbc_radius_neighbors_graph, CUDA implementation and header update, Cython bindings and RBCIndex support, NearestNeighbors radius parameter and radius_neighbors_graph method, Python wrapper checks, two new tests, and small classifier/regressor base-class import changes.

Changes

Cohort / File(s)	Summary
C++ Core `cpp/include/cuml/neighbors/knn.hpp`, `cpp/src/knn/knn.cu`	Add declaration and implementation of `rbc_radius_neighbors_graph` that computes CSR radius-neighbor graphs via RBC index and `cuvs::neighbors::ball_cover::eps_nn`. Update copyright year.
Cython bindings / NearestNeighbors API `python/cuml/cuml/neighbors/nearest_neighbors.pyx`	Expose `rbc_radius_neighbors_graph` to Python; add `RBCIndex.radius_neighbors_graph` (two-pass CSR: `adj_indptr` then `adj_indices`); add `n_samples` field to `RBCIndex`; remove prior RBC dimensionality restriction; add `radius` parameter to `NearestNeighbors` (init, params, serialization) and implement `NearestNeighbors.radius_neighbors_graph`.
Python sklearn wrapper `python/cuml/cuml/accel/_wrappers/sklearn/neighbors.py`	Add internal `_gpu_radius_neighbors_graph` method with validations (mode == "connectivity", no sparse X, supported metric only) and delegation to GPU `radius_neighbors_graph`. Add imports `is_sparse` and `UnsupportedOnGPU`.
Tests `python/cuml/tests/test_nearest_neighbors.py`	Add parameterized tests `test_radius_neighbors_graph` and `test_radius_neighbors_graph_errors` validating CSR outputs against scikit-learn and checking error handling (sparse input, negative radius, unsupported metric). SPDX year updated.
KNN high-level wrappers `python/cuml/cuml/neighbors/kneighbors_classifier.pyx`, `python/cuml/cuml/neighbors/kneighbors_regressor.pyx`	Change KNeighborsClassifier and KNeighborsRegressor to inherit from `NeighborsBase` instead of `NearestNeighbors` and update corresponding imports.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main feature being added: NearestNeighbors.radius_neighbors_graph. It directly correlates with the primary change across all modified files.
Description check	✅ Passed	The description clearly explains what is being added and references the underlying implementation (cuvs::neighbors::ball_cover::eps_nn), scope limitations, and related issues.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/include/cuml/neighbors/knn.hpp`:
- Around line 74-82: Add a complete Doxygen comment block for the new public API
function rbc_radius_neighbors_graph describing purpose, parameters,
return/outputs and the two-phase semantics: explain that when adj_cols==nullptr
and nnz==0 it performs a count-only pass writing per-row counts to adj_rows, and
on the fill pass callers must allocate adj_cols with size nnz and pass the same
adj_rows to receive neighbor column indices; document meanings and units of
handle, rbc_index, query, n_query, dim, radius, adj_rows, adj_cols, and nnz, any
preconditions (e.g., non-null pointers, how nnz is interpreted), and reference
similar documented functions like brute_force_knn / rbc_free_index for style and
phrasing so the header is consistent with existing Doxygen conventions.

In `@python/cuml/cuml/accel/_wrappers/sklearn/neighbors.py`:
- Around line 23-36: In _gpu_radius_neighbors_graph, accessing
self.effective_metric_ can raise AttributeError on unfitted proxies and bypass
the ProxyBase CPU-fallback; wrap the metric check to first guard for a fitted
state (e.g., use hasattr(self, "effective_metric_") or getattr(self,
"effective_metric_", None)) and if it's missing raise UnsupportedOnGPU with a
clear message (e.g., "estimator is not fitted" or similar) so
ProxyBase._call_method will catch it and fall back to the CPU path; update the
check in the _gpu_radius_neighbors_graph method accordingly.

In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx`:
- Around line 350-353: The connectivity data array is created with cp.ones(nnz)
which yields float64 by default, causing a dtype mismatch with kneighbors_graph
(which uses np.float32); change the data creation in nearest_neighbors.pyx to
cp.ones(nnz, dtype=cp.float32) (or cp.float32) so the returned
cupyx.scipy.sparse.csr_matrix has float32 connectivity values and matches
kneighbors_graph's dtype.
- Around line 1165-1167: Update the radius_neighbors_graph signature to include
the sklearn-compatible parameters mode='connectivity' and sort_results=False
(i.e., def radius_neighbors_graph(self, X=None, radius=None,
mode='connectivity', sort_results=False) -> SparseCumlArray) so callers passing
those kwargs won't get TypeError; inside the method (radius_neighbors_graph)
validate mode and raise a clear error if mode != 'connectivity' (or map behavior
if you implement others), and accept sort_results (no-op if unsupported) so API
matches scikit-learn while preserving existing connectivity-only behavior.
- Around line 304-354: The radius_neighbors_graph method has a race: after the
first with nogil call to rbc_radius_neighbors_graph (which populates indptr) the
code reads indptr[-1].item() without synchronizing the RAFT handle; add a call
to handle.sync() (using the same handle obtained via get_handle()) immediately
after the first with nogil block and before computing nnz to ensure the GPU work
writing indptr is finished; follow the same pattern used in RBCIndex.build and
RBCIndex.kneighbors to locate where to insert handle.sync() in
radius_neighbors_graph.

In `@python/cuml/tests/test_nearest_neighbors.py`:
- Around line 560-582: The test declares metric but never passes it to the KNN
models, so haversine cases are not exercised; update the
test_radius_neighbors_graph to pass metric=metric into both skKNN and cuKNN when
constructing sk_model and cu_model, and add a guard to skip the haversine branch
for non-2D inputs (e.g., if metric == "haversine" and n_features != 2:
pytest.skip(...)) so the test only runs haversine with 2 features.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

python/cuml/cuml/neighbors/nearest_neighbors.pyx (1)

304-316: Unused variable n_query; consider removing.

n_query is assigned on Line 312 but never used — n_rows (Line 316, same value) is what's actually passed to the C call. This is a harmless leftover but adds noise.

-        cdef int64_t n_query = X.shape[0]
-
         indptr = cp.empty(n_query + 1, dtype=np.int64)
+        indptr = cp.empty(n_rows + 1, dtype=np.int64)

#!/bin/bash
# Verify n_query is not used elsewhere in the method
rg -n 'n_query' python/cuml/cuml/neighbors/nearest_neighbors.pyx | head -20

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx` around lines 304 - 316, The
variable n_query in the radius_neighbors_graph function is unused and duplicates
n_rows; remove the declaration/assignment "cdef int64_t n_query = X.shape[0]"
and any references to n_query so the code only uses n_rows (or consolidate to
n_query if you prefer that name) to avoid the dead variable while leaving the
rest of the function (handle setup, indptr creation, X_ptr, n_rows usage)
unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx`:
- Around line 356-361: Add the same dimension guard used in RBCIndex.kneighbors
to RBCIndex.radius_neighbors_graph: check X.shape[1] and if it's > 3 raise a
ValueError with the same message ("The rbc algorithm is not supported for >3
dimensions currently.") so calls that invoke the underlying ball_cover logic are
prevented from running with unsupported dimensionality; update the
RBCIndex.radius_neighbors_graph method to perform this check before any
ball_cover operations.

---

Duplicate comments:
In `@python/cuml/cuml/accel/_wrappers/sklearn/neighbors.py`:
- Around line 32-36: Don't access self.effective_metric_ directly; guard that
attribute before the metric check in the radius_neighbors_graph wrapper so an
unfitted proxy doesn't raise AttributeError unexpectedly. Replace the direct
access in the check (the code that currently does if self.effective_metric_ not
in ["l2","euclidean"]) with a presence-safe check such as using hasattr(self,
"effective_metric_") or metric = getattr(self, "effective_metric_", None) and
only perform the UnsupportedOnGPU check when metric is present; leave behavior
unchanged for the call to self._gpu.radius_neighbors_graph(X=..., radius=...)
and ensure ProxyBase._call_method can still catch unfitted-state errors.

In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx`:
- Line 350: The array created for connectivity currently uses cp.ones(nnz) which
defaults to float64 and conflicts with kneighbors_graph's use of
dtype=np.float32; change the creation of data to explicitly use float32 (e.g.,
cp.ones(nnz, dtype=cp.float32) or equivalent) so the dtype matches
kneighbors_graph and avoids inconsistencies between the two graph methods.
- Around line 1165-1167: The cuML method radius_neighbors_graph lacks the
sklearn-compatible parameters mode and sort_results, causing unexpected-argument
errors; update the radius_neighbors_graph signature in nearest_neighbors.pyx to
accept mode='connectivity' and sort_results=False (matching sklearn defaults) in
addition to X and radius, then thread these parameters into the existing
implementation or validation logic used by the accel wrapper (ensure the
function forwards them to the same internal checks/dispatch that lives in
neighbors.py or handles them the same way), and maintain the existing decorators
(`@insert_into_docstring`, `@reflect`) and return type SparseCumlArray.
- Around line 320-333: The GPU-written indptr is read on the host without
synchronizing the RAFT/handle stream; after the first with nogil block that
calls rbc_radius_neighbors_graph (and likewise after the second with nogil that
writes device outputs) insert a handle synchronization before any host reads of
indptr (e.g., call handle_[0].sync() or the project’s RAFT handle.sync()
equivalent) so that the device kernel completes before executing
indptr[-1].item() and any subsequent host-side transfers.

---

Nitpick comments:
In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx`:
- Around line 304-316: The variable n_query in the radius_neighbors_graph
function is unused and duplicates n_rows; remove the declaration/assignment
"cdef int64_t n_query = X.shape[0]" and any references to n_query so the code
only uses n_rows (or consolidate to n_query if you prefer that name) to avoid
the dead variable while leaving the rest of the function (handle setup, indptr
creation, X_ptr, n_rows usage) unchanged.

This adds support for `NearestNeighbors.radius_neighbors_graph`, built on `cuvs::neighbors::ball_cover::eps_nn`.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

python/cuml/cuml/neighbors/nearest_neighbors.pyx (1)
312-316: Optional: n_rows duplicates n_query — one can be removed.

Both n_query (Line 312) and n_rows (Line 316) hold X.shape[0]. The rest of the method uses n_rows while n_query is only used for indptr allocation. Collapsing to a single variable reduces cognitive load.
♻️ Suggested simplification
-        cdef int64_t n_query = X.shape[0]
-
         indptr = cp.empty(n_query + 1, dtype=np.int64)
         cdef float* X_ptr = <float*><uintptr_t>X.ptr
         cdef int64_t n_rows = X.shape[0]
+        cdef int64_t n_query = n_rows
         cdef int64_t n_cols = X.shape[1]
or simply use n_rows everywhere and drop n_query.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx` around lines 312 - 316,
n_query duplicates n_rows; remove the redundant variable and use n_rows
everywhere: delete the cdef int64_t n_query declaration and change the indptr
allocation to use n_rows + 1 (indptr = cp.empty(n_rows + 1, dtype=np.int64));
ensure any other uses of n_query are replaced with n_rows (the method currently
uses n_rows elsewhere) so there is a single authoritative row-count variable in
nearest_neighbors.pyx.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/include/cuml/neighbors/knn.hpp`:
- Around line 79-82: Update the Doxygen comment that incorrectly uses Python
negative-index notation: replace occurrences of adj_indptr[-1] with the
C++-appropriate adj_indptr[n_query] (since adj_indptr has size n_query+1) and
ensure the explanation for computing nnz uses adj_indptr[n_query]; check the
related mentions near the adj_indices and nnz description (the same pattern
appears again around the later comment block) and update them to the same
adj_indptr[n_query] form.

In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx`:
- Around line 1274-1275: The current guard if radius <= 0 in
nearest_neighbors.pyx doesn't reject NaN or Inf; update the check to validate
finiteness and positivity (e.g., import math and replace the guard with if not
math.isfinite(radius) or radius <= 0: raise ValueError(...)) so NaN/inf are
rejected before hitting the C++ layer; ensure the error message mentions
invalid/non-finite radius and keep the check around the same scope where the
variable radius is validated.

In `@python/cuml/tests/test_nearest_neighbors.py`:
- Around line 600-602: Add a second assertion in the test to exercise the
zero-radius branch: call model.radius_neighbors_graph(radius=0) and assert it
raises the same ValueError with message "Expected `radius > 0`, got 0" (or use
match "Expected `radius > 0`" to be less brittle); this ensures the
radius_neighbors_graph guard against radius <= 0 is covered. Reference the
existing test block around model.radius_neighbors_graph to insert the new
pytest.raises check for radius=0.

---

Nitpick comments:
In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx`:
- Around line 312-316: n_query duplicates n_rows; remove the redundant variable
and use n_rows everywhere: delete the cdef int64_t n_query declaration and
change the indptr allocation to use n_rows + 1 (indptr = cp.empty(n_rows + 1,
dtype=np.int64)); ensure any other uses of n_query are replaced with n_rows (the
method currently uses n_rows elsewhere) so there is a single authoritative
row-count variable in nearest_neighbors.pyx.

viclafargue

Thanks! LGTM, just minor questions. Thanks for removing radius search from KNN classifier and regressor. I think that NearestNeighborsMG also inherits from the NearestNeighbor class too, but this is not directly exposed to user, so less of a concern.

viclafargue · 2026-02-20T14:14:58Z

+        cdef int64_t n_query = X.shape[0]
+
+        indptr = cp.empty(n_query + 1, dtype=np.int64)
+        cdef float* X_ptr = <float*><uintptr_t>X.ptr
+        cdef int64_t n_rows = X.shape[0]


Could we fuse n_rows and n_query as n_queries?

Agree we have an extraneous local here, will fix in the followup.

viclafargue · 2026-02-20T14:36:17Z

+        This method is most efficient when the instance is fit with
+        `algorithm="rbc"`. Other algorithms will build a temporary RBC index
+        per-call, which adds a small overhead.


Should we replicate sklearn behavior exactly here or instead ensure that the user always have optimal use of VRAM by demanding the RBC index to be built during fit and thus avoid uselessly storing the dataset in VRAM in addition to the index?

I don't think we should error. If anything, we could add a log or warning in this case, but IMO if we can make something just work then we should. Happy to discuss and maybe add a warning/log in the followup.

viclafargue · 2026-02-20T14:37:04Z

+        >>> import cupy as cp
+        >>> from cuml.neighbors import NearestNeighbors
+        >>> X = cp.array([[0], [3], [1]])
+        >>> nn = NearestNeighbors().fit(X)


Shouldn't the minimal doc example demonstrate use with algorithm="rbc"?

Agreed - I'll add that in the followup.

jcrist · 2026-02-23T02:43:32Z

Thanks for the review! I agree with a few of your comments. Will fix them in the followup PR while expanding support; none seemed worth spending an extra CI cycle here when I'll be touching this function again in a followup.

jcrist · 2026-02-23T02:43:37Z

/merge

jcrist self-assigned this Feb 17, 2026

jcrist added feature request New feature or request non-breaking Non-breaking change labels Feb 17, 2026

github-actions Bot added Cython / Python Cython or Python issue CUDA/C++ labels Feb 17, 2026

jcrist force-pushed the radius-neighbors-graph branch from b7ca26d to b0cae30 Compare February 18, 2026 20:00

jcrist changed the title ~~[WIP] Add NearestNeighbors.radius_neighbors_graph~~ Add NearestNeighbors.radius_neighbors_graph Feb 18, 2026

jcrist marked this pull request as ready for review February 18, 2026 20:03

jcrist requested review from a team as code owners February 18, 2026 20:03

jcrist requested review from aamijar, csadorf and hcho3 February 18, 2026 20:03

coderabbitai Bot reviewed Feb 18, 2026

View reviewed changes

jcrist force-pushed the radius-neighbors-graph branch from b0cae30 to dfae43f Compare February 19, 2026 02:33

coderabbitai Bot reviewed Feb 19, 2026

View reviewed changes

Comment thread python/cuml/cuml/neighbors/nearest_neighbors.pyx

jcrist added 4 commits February 19, 2026 11:24

Add NearestNeighbors.radius_neighbors_graph

ee32a3f

This adds support for `NearestNeighbors.radius_neighbors_graph`, built on `cuvs::neighbors::ball_cover::eps_nn`.

Support radius_neighbors_graph in cuml.accel

89cceb1

Fixups

95f52a8

Split out NeighborsBase base class

cee586c

jcrist force-pushed the radius-neighbors-graph branch from dfae43f to cee586c Compare February 19, 2026 17:24

No **kwargs

7859329

coderabbitai Bot reviewed Feb 19, 2026

View reviewed changes

Comment thread cpp/include/cuml/neighbors/knn.hpp

Comment thread python/cuml/cuml/neighbors/nearest_neighbors.pyx

Comment thread python/cuml/tests/test_nearest_neighbors.py

viclafargue approved these changes Feb 20, 2026

View reviewed changes

rapids-bot Bot merged commit 237e589 into rapidsai:main Feb 23, 2026
99 checks passed

jcrist deleted the radius-neighbors-graph branch February 23, 2026 02:43

coderabbitai Bot mentioned this pull request Mar 9, 2026

Add cuml.internals.validation, check_is_fitted checks #7868

Merged

coderabbitai Bot mentioned this pull request Apr 29, 2026

Apply new validation to cuml.neighbors #8019

Merged

Conversation

jcrist commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot Bot commented Feb 17, 2026

Uh oh!

coderabbitai Bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

viclafargue left a comment

Choose a reason for hiding this comment

Uh oh!

viclafargue Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

jcrist Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

viclafargue Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

jcrist Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

viclafargue Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

jcrist Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jcrist commented Feb 23, 2026

Uh oh!

jcrist commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jcrist commented Feb 17, 2026 •

edited

Loading

coderabbitai Bot commented Feb 18, 2026 •

edited

Loading