Skip to content

Add NearestNeighbors.radius_neighbors_graph#7811

Merged
rapids-bot[bot] merged 5 commits intorapidsai:mainfrom
jcrist:radius-neighbors-graph
Feb 23, 2026
Merged

Add NearestNeighbors.radius_neighbors_graph#7811
rapids-bot[bot] merged 5 commits intorapidsai:mainfrom
jcrist:radius-neighbors-graph

Conversation

@jcrist
Copy link
Copy Markdown
Member

@jcrist jcrist commented Feb 17, 2026

This adds support for NearestNeighbors.radius_neighbors_graph, built on cuvs::neighbors::ball_cover::eps_nn.

This only adds support for mode="connectivity", mode="distance" requires rapidsai/cuvs#1815 from cuvs and will be handled in a followup. Likewise for NearestNeighbors.radius_neighbors.

Part of #262, Part of #7759.

@jcrist jcrist self-assigned this Feb 17, 2026
@jcrist jcrist added feature request New feature or request non-breaking Non-breaking change labels Feb 17, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Feb 17, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions Bot added Cython / Python Cython or Python issue CUDA/C++ labels Feb 17, 2026
@jcrist jcrist force-pushed the radius-neighbors-graph branch from b7ca26d to b0cae30 Compare February 18, 2026 20:00
@jcrist jcrist changed the title [WIP] Add NearestNeighbors.radius_neighbors_graph Add NearestNeighbors.radius_neighbors_graph Feb 18, 2026
@jcrist jcrist marked this pull request as ready for review February 18, 2026 20:03
@jcrist jcrist requested review from a team as code owners February 18, 2026 20:03
@jcrist jcrist requested review from aamijar, csadorf and hcho3 February 18, 2026 20:03
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 18, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Radius-based neighbor graph added (CSR connectivity for points within a radius), including a GPU-backed pathway and two-pass compute for large results.
    • NearestNeighbors now exposes and persists a radius parameter and supports radius queries via ball-cover indexing, with relaxed dimensionality limits for indexing.
  • Tests

    • Added tests covering radius graph correctness and error handling (sparse inputs, invalid radius, unsupported metrics).

Walkthrough

Adds GPU radius-based neighbor-graph support: new C++/CUDA API rbc_radius_neighbors_graph, CUDA implementation and header update, Cython bindings and RBCIndex support, NearestNeighbors radius parameter and radius_neighbors_graph method, Python wrapper checks, two new tests, and small classifier/regressor base-class import changes.

Changes

Cohort / File(s) Summary
C++ Core
cpp/include/cuml/neighbors/knn.hpp, cpp/src/knn/knn.cu
Add declaration and implementation of rbc_radius_neighbors_graph that computes CSR radius-neighbor graphs via RBC index and cuvs::neighbors::ball_cover::eps_nn. Update copyright year.
Cython bindings / NearestNeighbors API
python/cuml/cuml/neighbors/nearest_neighbors.pyx
Expose rbc_radius_neighbors_graph to Python; add RBCIndex.radius_neighbors_graph (two-pass CSR: adj_indptr then adj_indices); add n_samples field to RBCIndex; remove prior RBC dimensionality restriction; add radius parameter to NearestNeighbors (init, params, serialization) and implement NearestNeighbors.radius_neighbors_graph.
Python sklearn wrapper
python/cuml/cuml/accel/_wrappers/sklearn/neighbors.py
Add internal _gpu_radius_neighbors_graph method with validations (mode == "connectivity", no sparse X, supported metric only) and delegation to GPU radius_neighbors_graph. Add imports is_sparse and UnsupportedOnGPU.
Tests
python/cuml/tests/test_nearest_neighbors.py
Add parameterized tests test_radius_neighbors_graph and test_radius_neighbors_graph_errors validating CSR outputs against scikit-learn and checking error handling (sparse input, negative radius, unsupported metric). SPDX year updated.
KNN high-level wrappers
python/cuml/cuml/neighbors/kneighbors_classifier.pyx, python/cuml/cuml/neighbors/kneighbors_regressor.pyx
Change KNeighborsClassifier and KNeighborsRegressor to inherit from NeighborsBase instead of NearestNeighbors and update corresponding imports.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main feature being added: NearestNeighbors.radius_neighbors_graph. It directly correlates with the primary change across all modified files.
Description check ✅ Passed The description clearly explains what is being added and references the underlying implementation (cuvs::neighbors::ball_cover::eps_nn), scope limitations, and related issues.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/include/cuml/neighbors/knn.hpp`:
- Around line 74-82: Add a complete Doxygen comment block for the new public API
function rbc_radius_neighbors_graph describing purpose, parameters,
return/outputs and the two-phase semantics: explain that when adj_cols==nullptr
and nnz==0 it performs a count-only pass writing per-row counts to adj_rows, and
on the fill pass callers must allocate adj_cols with size nnz and pass the same
adj_rows to receive neighbor column indices; document meanings and units of
handle, rbc_index, query, n_query, dim, radius, adj_rows, adj_cols, and nnz, any
preconditions (e.g., non-null pointers, how nnz is interpreted), and reference
similar documented functions like brute_force_knn / rbc_free_index for style and
phrasing so the header is consistent with existing Doxygen conventions.

In `@python/cuml/cuml/accel/_wrappers/sklearn/neighbors.py`:
- Around line 23-36: In _gpu_radius_neighbors_graph, accessing
self.effective_metric_ can raise AttributeError on unfitted proxies and bypass
the ProxyBase CPU-fallback; wrap the metric check to first guard for a fitted
state (e.g., use hasattr(self, "effective_metric_") or getattr(self,
"effective_metric_", None)) and if it's missing raise UnsupportedOnGPU with a
clear message (e.g., "estimator is not fitted" or similar) so
ProxyBase._call_method will catch it and fall back to the CPU path; update the
check in the _gpu_radius_neighbors_graph method accordingly.

In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx`:
- Around line 350-353: The connectivity data array is created with cp.ones(nnz)
which yields float64 by default, causing a dtype mismatch with kneighbors_graph
(which uses np.float32); change the data creation in nearest_neighbors.pyx to
cp.ones(nnz, dtype=cp.float32) (or cp.float32) so the returned
cupyx.scipy.sparse.csr_matrix has float32 connectivity values and matches
kneighbors_graph's dtype.
- Around line 1165-1167: Update the radius_neighbors_graph signature to include
the sklearn-compatible parameters mode='connectivity' and sort_results=False
(i.e., def radius_neighbors_graph(self, X=None, radius=None,
mode='connectivity', sort_results=False) -> SparseCumlArray) so callers passing
those kwargs won't get TypeError; inside the method (radius_neighbors_graph)
validate mode and raise a clear error if mode != 'connectivity' (or map behavior
if you implement others), and accept sort_results (no-op if unsupported) so API
matches scikit-learn while preserving existing connectivity-only behavior.
- Around line 304-354: The radius_neighbors_graph method has a race: after the
first with nogil call to rbc_radius_neighbors_graph (which populates indptr) the
code reads indptr[-1].item() without synchronizing the RAFT handle; add a call
to handle.sync() (using the same handle obtained via get_handle()) immediately
after the first with nogil block and before computing nnz to ensure the GPU work
writing indptr is finished; follow the same pattern used in RBCIndex.build and
RBCIndex.kneighbors to locate where to insert handle.sync() in
radius_neighbors_graph.

In `@python/cuml/tests/test_nearest_neighbors.py`:
- Around line 560-582: The test declares metric but never passes it to the KNN
models, so haversine cases are not exercised; update the
test_radius_neighbors_graph to pass metric=metric into both skKNN and cuKNN when
constructing sk_model and cu_model, and add a guard to skip the haversine branch
for non-2D inputs (e.g., if metric == "haversine" and n_features != 2:
pytest.skip(...)) so the test only runs haversine with 2 features.

Comment thread cpp/include/cuml/neighbors/knn.hpp Outdated
Comment thread python/cuml/cuml/accel/_wrappers/sklearn/neighbors.py
Comment thread python/cuml/cuml/neighbors/nearest_neighbors.pyx
Comment thread python/cuml/cuml/neighbors/nearest_neighbors.pyx
Comment thread python/cuml/cuml/neighbors/nearest_neighbors.pyx
Comment thread python/cuml/tests/test_nearest_neighbors.py Outdated
@jcrist jcrist force-pushed the radius-neighbors-graph branch from b0cae30 to dfae43f Compare February 19, 2026 02:33
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
python/cuml/cuml/neighbors/nearest_neighbors.pyx (1)

304-316: Unused variable n_query; consider removing.

n_query is assigned on Line 312 but never used — n_rows (Line 316, same value) is what's actually passed to the C call. This is a harmless leftover but adds noise.

-        cdef int64_t n_query = X.shape[0]
-
         indptr = cp.empty(n_query + 1, dtype=np.int64)
+        indptr = cp.empty(n_rows + 1, dtype=np.int64)
#!/bin/bash
# Verify n_query is not used elsewhere in the method
rg -n 'n_query' python/cuml/cuml/neighbors/nearest_neighbors.pyx | head -20
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx` around lines 304 - 316, The
variable n_query in the radius_neighbors_graph function is unused and duplicates
n_rows; remove the declaration/assignment "cdef int64_t n_query = X.shape[0]"
and any references to n_query so the code only uses n_rows (or consolidate to
n_query if you prefer that name) to avoid the dead variable while leaving the
rest of the function (handle setup, indptr creation, X_ptr, n_rows usage)
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx`:
- Around line 356-361: Add the same dimension guard used in RBCIndex.kneighbors
to RBCIndex.radius_neighbors_graph: check X.shape[1] and if it's > 3 raise a
ValueError with the same message ("The rbc algorithm is not supported for >3
dimensions currently.") so calls that invoke the underlying ball_cover logic are
prevented from running with unsupported dimensionality; update the
RBCIndex.radius_neighbors_graph method to perform this check before any
ball_cover operations.

---

Duplicate comments:
In `@python/cuml/cuml/accel/_wrappers/sklearn/neighbors.py`:
- Around line 32-36: Don't access self.effective_metric_ directly; guard that
attribute before the metric check in the radius_neighbors_graph wrapper so an
unfitted proxy doesn't raise AttributeError unexpectedly. Replace the direct
access in the check (the code that currently does if self.effective_metric_ not
in ["l2","euclidean"]) with a presence-safe check such as using hasattr(self,
"effective_metric_") or metric = getattr(self, "effective_metric_", None) and
only perform the UnsupportedOnGPU check when metric is present; leave behavior
unchanged for the call to self._gpu.radius_neighbors_graph(X=..., radius=...)
and ensure ProxyBase._call_method can still catch unfitted-state errors.

In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx`:
- Line 350: The array created for connectivity currently uses cp.ones(nnz) which
defaults to float64 and conflicts with kneighbors_graph's use of
dtype=np.float32; change the creation of data to explicitly use float32 (e.g.,
cp.ones(nnz, dtype=cp.float32) or equivalent) so the dtype matches
kneighbors_graph and avoids inconsistencies between the two graph methods.
- Around line 1165-1167: The cuML method radius_neighbors_graph lacks the
sklearn-compatible parameters mode and sort_results, causing unexpected-argument
errors; update the radius_neighbors_graph signature in nearest_neighbors.pyx to
accept mode='connectivity' and sort_results=False (matching sklearn defaults) in
addition to X and radius, then thread these parameters into the existing
implementation or validation logic used by the accel wrapper (ensure the
function forwards them to the same internal checks/dispatch that lives in
neighbors.py or handles them the same way), and maintain the existing decorators
(`@insert_into_docstring`, `@reflect`) and return type SparseCumlArray.
- Around line 320-333: The GPU-written indptr is read on the host without
synchronizing the RAFT/handle stream; after the first with nogil block that
calls rbc_radius_neighbors_graph (and likewise after the second with nogil that
writes device outputs) insert a handle synchronization before any host reads of
indptr (e.g., call handle_[0].sync() or the project’s RAFT handle.sync()
equivalent) so that the device kernel completes before executing
indptr[-1].item() and any subsequent host-side transfers.

---

Nitpick comments:
In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx`:
- Around line 304-316: The variable n_query in the radius_neighbors_graph
function is unused and duplicates n_rows; remove the declaration/assignment
"cdef int64_t n_query = X.shape[0]" and any references to n_query so the code
only uses n_rows (or consolidate to n_query if you prefer that name) to avoid
the dead variable while leaving the rest of the function (handle setup, indptr
creation, X_ptr, n_rows usage) unchanged.

Comment thread python/cuml/cuml/neighbors/nearest_neighbors.pyx
This adds support for `NearestNeighbors.radius_neighbors_graph`, built
on `cuvs::neighbors::ball_cover::eps_nn`.
@jcrist jcrist force-pushed the radius-neighbors-graph branch from dfae43f to cee586c Compare February 19, 2026 17:24
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
python/cuml/cuml/neighbors/nearest_neighbors.pyx (1)

312-316: Optional: n_rows duplicates n_query — one can be removed.

Both n_query (Line 312) and n_rows (Line 316) hold X.shape[0]. The rest of the method uses n_rows while n_query is only used for indptr allocation. Collapsing to a single variable reduces cognitive load.

♻️ Suggested simplification
-        cdef int64_t n_query = X.shape[0]
-
         indptr = cp.empty(n_query + 1, dtype=np.int64)
         cdef float* X_ptr = <float*><uintptr_t>X.ptr
         cdef int64_t n_rows = X.shape[0]
+        cdef int64_t n_query = n_rows
         cdef int64_t n_cols = X.shape[1]

or simply use n_rows everywhere and drop n_query.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx` around lines 312 - 316,
n_query duplicates n_rows; remove the redundant variable and use n_rows
everywhere: delete the cdef int64_t n_query declaration and change the indptr
allocation to use n_rows + 1 (indptr = cp.empty(n_rows + 1, dtype=np.int64));
ensure any other uses of n_query are replaced with n_rows (the method currently
uses n_rows elsewhere) so there is a single authoritative row-count variable in
nearest_neighbors.pyx.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/include/cuml/neighbors/knn.hpp`:
- Around line 79-82: Update the Doxygen comment that incorrectly uses Python
negative-index notation: replace occurrences of adj_indptr[-1] with the
C++-appropriate adj_indptr[n_query] (since adj_indptr has size n_query+1) and
ensure the explanation for computing nnz uses adj_indptr[n_query]; check the
related mentions near the adj_indices and nnz description (the same pattern
appears again around the later comment block) and update them to the same
adj_indptr[n_query] form.

In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx`:
- Around line 1274-1275: The current guard if radius <= 0 in
nearest_neighbors.pyx doesn't reject NaN or Inf; update the check to validate
finiteness and positivity (e.g., import math and replace the guard with if not
math.isfinite(radius) or radius <= 0: raise ValueError(...)) so NaN/inf are
rejected before hitting the C++ layer; ensure the error message mentions
invalid/non-finite radius and keep the check around the same scope where the
variable radius is validated.

In `@python/cuml/tests/test_nearest_neighbors.py`:
- Around line 600-602: Add a second assertion in the test to exercise the
zero-radius branch: call model.radius_neighbors_graph(radius=0) and assert it
raises the same ValueError with message "Expected `radius > 0`, got 0" (or use
match "Expected `radius > 0`" to be less brittle); this ensures the
radius_neighbors_graph guard against radius <= 0 is covered. Reference the
existing test block around model.radius_neighbors_graph to insert the new
pytest.raises check for radius=0.

---

Nitpick comments:
In `@python/cuml/cuml/neighbors/nearest_neighbors.pyx`:
- Around line 312-316: n_query duplicates n_rows; remove the redundant variable
and use n_rows everywhere: delete the cdef int64_t n_query declaration and
change the indptr allocation to use n_rows + 1 (indptr = cp.empty(n_rows + 1,
dtype=np.int64)); ensure any other uses of n_query are replaced with n_rows (the
method currently uses n_rows elsewhere) so there is a single authoritative
row-count variable in nearest_neighbors.pyx.

Comment thread cpp/include/cuml/neighbors/knn.hpp
Comment thread python/cuml/cuml/neighbors/nearest_neighbors.pyx
Comment thread python/cuml/tests/test_nearest_neighbors.py
Copy link
Copy Markdown
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM, just minor questions. Thanks for removing radius search from KNN classifier and regressor. I think that NearestNeighborsMG also inherits from the NearestNeighbor class too, but this is not directly exposed to user, so less of a concern.

Comment on lines +312 to +316
cdef int64_t n_query = X.shape[0]

indptr = cp.empty(n_query + 1, dtype=np.int64)
cdef float* X_ptr = <float*><uintptr_t>X.ptr
cdef int64_t n_rows = X.shape[0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we fuse n_rows and n_query as n_queries?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we have an extraneous local here, will fix in the followup.

Comment on lines +1239 to +1241
This method is most efficient when the instance is fit with
`algorithm="rbc"`. Other algorithms will build a temporary RBC index
per-call, which adds a small overhead.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we replicate sklearn behavior exactly here or instead ensure that the user always have optimal use of VRAM by demanding the RBC index to be built during fit and thus avoid uselessly storing the dataset in VRAM in addition to the index?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should error. If anything, we could add a log or warning in this case, but IMO if we can make something just work then we should. Happy to discuss and maybe add a warning/log in the followup.

>>> import cupy as cp
>>> from cuml.neighbors import NearestNeighbors
>>> X = cp.array([[0], [3], [1]])
>>> nn = NearestNeighbors().fit(X)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the minimal doc example demonstrate use with algorithm="rbc"?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - I'll add that in the followup.

Comment thread python/cuml/cuml/neighbors/nearest_neighbors.pyx
@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Feb 23, 2026

Thanks for the review! I agree with a few of your comments. Will fix them in the followup PR while expanding support; none seemed worth spending an extra CI cycle here when I'll be touching this function again in a followup.

@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Feb 23, 2026

/merge

@rapids-bot rapids-bot Bot merged commit 237e589 into rapidsai:main Feb 23, 2026
99 checks passed
@jcrist jcrist deleted the radius-neighbors-graph branch February 23, 2026 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CUDA/C++ Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants