Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
9c0d36c
spectral-clustering-cuml-accel
aamijar Feb 16, 2026
ecfd2b3
Merge branch 'main' into spectral-clustering-cuml-accel
aamijar Feb 18, 2026
3356c66
Merge branch 'main' into spectral-clustering-cuml-accel
aamijar Feb 19, 2026
ebfdb91
update xfail
aamijar Feb 20, 2026
178e310
update docs
aamijar Feb 20, 2026
afca415
Merge branch 'main' into spectral-clustering-cuml-accel
aamijar Feb 24, 2026
8e65318
update xfail condition oldest-deps
aamijar Feb 25, 2026
a5e3e37
Merge branch 'main' into spectral-clustering-cuml-accel
aamijar Feb 25, 2026
23a75ed
fix xfail
aamijar Feb 25, 2026
f456595
Merge branch 'main' into spectral-clustering-cuml-accel
aamijar Feb 25, 2026
7e9155d
Merge branch 'main' into spectral-clustering-cuml-accel
aamijar Mar 10, 2026
e4c1143
Update python/cuml/cuml/cluster/spectral_clustering.pyx
aamijar Mar 10, 2026
a779f60
Update python/cuml/cuml_accel_tests/integration/test_spectral_cluster…
aamijar Mar 10, 2026
5ca720f
Update python/cuml/cuml_accel_tests/integration/test_spectral_cluster…
aamijar Mar 10, 2026
20501e2
Update python/cuml/cuml_accel_tests/integration/test_spectral_cluster…
aamijar Mar 10, 2026
6fa7376
Update python/cuml/cuml_accel_tests/integration/test_spectral_cluster…
aamijar Mar 10, 2026
eb4547c
Update python/cuml/cuml_accel_tests/integration/test_spectral_cluster…
aamijar Mar 10, 2026
f987713
Update python/cuml/cuml_accel_tests/integration/test_spectral_cluster…
aamijar Mar 10, 2026
11e9c4c
Update python/cuml/cuml_accel_tests/integration/test_spectral_cluster…
aamijar Mar 10, 2026
e66c72f
address review
aamijar Mar 11, 2026
c5923a2
pytest skip
aamijar Mar 11, 2026
044806a
Merge branch 'main' into spectral-clustering-cuml-accel
aamijar Mar 11, 2026
def0dba
different blob datasets
aamijar Mar 11, 2026
f514de9
tighter clusters
aamijar Mar 12, 2026
8333cf6
solve memory bugs, and remove pytest skip
aamijar Mar 13, 2026
175dc91
Merge branch 'release/26.04' into spectral-clustering-cuml-accel
aamijar Mar 13, 2026
28896cf
remove unused
aamijar Mar 13, 2026
8c7e73b
Merge branch 'release/26.04' into spectral-clustering-cuml-accel
aamijar Mar 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/cuml-accel/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ the following estimators are mostly or entirely accelerated when run with
* Scikit-Learn
* ``sklearn.cluster.KMeans``
* ``sklearn.cluster.DBSCAN``
* ``sklearn.cluster.SpectralClustering``
* ``sklearn.covariance.LedoitWolf``
* ``sklearn.decomposition.PCA``
* ``sklearn.decomposition.TruncatedSVD``
Expand Down
12 changes: 12 additions & 0 deletions docs/source/cuml-accel/limitations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,18 @@ KMeans
- If a callable ``init`` is provided.
- If ``X`` is sparse.

SpectralClustering
^^^^^^^^^^^^^^^^^^

``SpectralClustering`` will fall back to CPU in the following cases:

- If ``assign_labels`` is not ``"kmeans"``.
- If ``affinity`` is not ``"nearest_neighbors"`` or ``"precomputed"``.
Comment thread
coderabbitai[bot] marked this conversation as resolved.

The following fitted attributes are currently not computed:

- ``affinity_matrix_``
Comment thread
coderabbitai[bot] marked this conversation as resolved.
Comment on lines +109 to +122
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the bot comments here, if the affinity parameter is left to its default, it will not be GPU accelerated and it would be nice to document this. Also, we should document that the estimator will fall back to CPU if X is sparse (as documented for SpectralEmbedding).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in e66c72f


DBSCAN
^^^^^^

Expand Down
9 changes: 7 additions & 2 deletions python/cuml/cuml/accel/_wrappers/sklearn/cluster.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION.
# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION.
# SPDX-License-Identifier: Apache-2.0
#

Expand All @@ -10,7 +10,7 @@
from cuml.accel.estimator_proxy import ProxyBase
from cuml.internals.interop import UnsupportedOnGPU

__all__ = ("KMeans", "DBSCAN")
__all__ = ("KMeans", "DBSCAN", "SpectralClustering")


class KMeans(ProxyBase):
Expand Down Expand Up @@ -44,3 +44,8 @@ def _gpu_fit(self, X, y=None, sample_weight=None):
def _gpu_fit_predict(self, X, y=None, sample_weight=None):
# Fixes signature mismatch with cuml.DBSCAN. Can be removed after #6741.
return self._gpu.fit_predict(X, y=y, sample_weight=sample_weight)


class SpectralClustering(ProxyBase):
_gpu_class = cuml.cluster.SpectralClustering
_not_implemented_attributes = frozenset(("affinity_matrix_",))
60 changes: 59 additions & 1 deletion python/cuml/cuml/cluster/spectral_clustering.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,13 @@ from cuml.common.array_descriptor import CumlArrayDescriptor
from cuml.internals.array import CumlArray
from cuml.internals.base import Base, get_handle
from cuml.internals.input_utils import input_to_cupy_array
from cuml.internals.interop import (
InteropMixin,
UnsupportedOnGPU,
to_cpu,
to_gpu,
)
from cuml.internals.mixins import ClusterMixin, CMajorInputTagMixin
from cuml.internals.utils import check_random_seed

from libc.stdint cimport uint64_t, uintptr_t
Expand Down Expand Up @@ -254,7 +261,10 @@ def spectral_clustering(
return labels


class SpectralClustering(Base):
class SpectralClustering(Base,
InteropMixin,
ClusterMixin,
CMajorInputTagMixin):
"""Apply spectral clustering from the normalized Laplacian.

In practice spectral clustering is very useful when the structure of
Expand Down Expand Up @@ -352,6 +362,54 @@ class SpectralClustering(Base):
"""
labels_ = CumlArrayDescriptor()

_cpu_class_path = "sklearn.cluster.SpectralClustering"

_SUPPORTED_AFFINITIES = frozenset(("nearest_neighbors", "precomputed"))

@classmethod
def _params_from_cpu(cls, model):
if model.affinity not in cls._SUPPORTED_AFFINITIES:
Comment thread
aamijar marked this conversation as resolved.
Outdated
raise UnsupportedOnGPU(
f"`affinity={model.affinity!r}` is not supported"
)
if model.assign_labels != "kmeans":
raise UnsupportedOnGPU(
f"`assign_labels={model.assign_labels!r}` is not supported"
)
return {
"n_clusters": model.n_clusters,
"n_components": model.n_components,
"random_state": model.random_state,
"n_neighbors": model.n_neighbors,
"n_init": model.n_init,
"eigen_tol": model.eigen_tol,
"affinity": model.affinity,
}

def _params_to_cpu(self):
return {
"n_clusters": self.n_clusters,
"n_components": self.n_components,
"random_state": self.random_state,
"n_neighbors": self.n_neighbors,
"n_init": self.n_init,
"eigen_tol": self.eigen_tol,
"affinity": self.affinity,
"assign_labels": "kmeans",
}

def _attrs_from_cpu(self, model):
return {
"labels_": to_gpu(model.labels_, order="C"),
**super()._attrs_from_cpu(model),
}

def _attrs_to_cpu(self, model):
return {
"labels_": to_cpu(self.labels_, order="C"),
**super()._attrs_to_cpu(model),
}

def __init__(
self,
n_clusters=8,
Expand Down
149 changes: 149 additions & 0 deletions python/cuml/cuml_accel_tests/integration/test_spectral_clustering.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
#
# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION.
# SPDX-License-Identifier: Apache-2.0
#

import numpy as np
import pytest
from sklearn.cluster import SpectralClustering
from sklearn.datasets import make_blobs
from sklearn.metrics import adjusted_rand_score


@pytest.fixture(scope="module")
def clustering_data():
X, y = make_blobs(
n_samples=300, centers=3, cluster_std=1.0, random_state=42
)
return X.astype(np.float32), y


def test_spectral_clustering_default(clustering_data):
X, y = clustering_data
sc = SpectralClustering(affinity="nearest_neighbors", random_state=42).fit(
X
)
assert sc.labels_.shape == y.shape


@pytest.mark.parametrize("n_clusters", [2, 3, 4, 5])
def test_spectral_clustering_n_clusters(clustering_data, n_clusters):
X, y_true = clustering_data
sc = SpectralClustering(
n_clusters=n_clusters,
affinity="nearest_neighbors",
random_state=42,
).fit(X)
y_pred = sc.labels_
adjusted_rand_score(y_true, y_pred)
Comment thread
aamijar marked this conversation as resolved.
Outdated

Comment thread
coderabbitai[bot] marked this conversation as resolved.

@pytest.mark.parametrize("n_neighbors", [5, 10, 20])
def test_spectral_clustering_n_neighbors(clustering_data, n_neighbors):
X, y_true = clustering_data
sc = SpectralClustering(
n_clusters=3,
affinity="nearest_neighbors",
n_neighbors=n_neighbors,
random_state=42,
).fit(X)
y_pred = sc.labels_
adjusted_rand_score(y_true, y_pred)
Comment thread
aamijar marked this conversation as resolved.
Outdated


@pytest.mark.parametrize("n_components", [2, 3, 5])
def test_spectral_clustering_n_components(clustering_data, n_components):
X, y_true = clustering_data
sc = SpectralClustering(
n_clusters=3,
n_components=n_components,
affinity="nearest_neighbors",
random_state=42,
).fit(X)
y_pred = sc.labels_
adjusted_rand_score(y_true, y_pred)
Comment thread
aamijar marked this conversation as resolved.
Outdated


@pytest.mark.parametrize("n_init", [1, 5, 10])
def test_spectral_clustering_n_init(clustering_data, n_init):
X, y_true = clustering_data
sc = SpectralClustering(
n_clusters=3,
affinity="nearest_neighbors",
n_init=n_init,
random_state=42,
).fit(X)
y_pred = sc.labels_
adjusted_rand_score(y_true, y_pred)
Comment thread
aamijar marked this conversation as resolved.
Outdated


@pytest.mark.parametrize("eigen_tol", ["auto", 0.0, 1e-4])
def test_spectral_clustering_eigen_tol(clustering_data, eigen_tol):
X, y_true = clustering_data
sc = SpectralClustering(
n_clusters=3,
affinity="nearest_neighbors",
eigen_tol=eigen_tol,
random_state=42,
).fit(X)
y_pred = sc.labels_
adjusted_rand_score(y_true, y_pred)
Comment thread
aamijar marked this conversation as resolved.
Outdated


@pytest.mark.parametrize(
"assign_labels", ["kmeans", "discretize", "cluster_qr"]
)
def test_spectral_clustering_assign_labels(clustering_data, assign_labels):
X, y_true = clustering_data
sc = SpectralClustering(
n_clusters=3,
affinity="nearest_neighbors",
assign_labels=assign_labels,
random_state=42,
).fit(X)
y_pred = sc.labels_
adjusted_rand_score(y_true, y_pred)
Comment thread
aamijar marked this conversation as resolved.
Outdated


def test_spectral_clustering_precomputed(clustering_data):
from sklearn.neighbors import kneighbors_graph

X, y_true = clustering_data
connectivity = kneighbors_graph(X, n_neighbors=10, include_self=True)
affinity_matrix = 0.5 * (connectivity + connectivity.T)
sc = SpectralClustering(
n_clusters=3,
affinity="precomputed",
random_state=42,
).fit(affinity_matrix)
y_pred = sc.labels_
adjusted_rand_score(y_true, y_pred)
Comment thread
aamijar marked this conversation as resolved.
Outdated


def test_spectral_clustering_fit_predict(clustering_data):
X, y_true = clustering_data
sc = SpectralClustering(
n_clusters=3,
affinity="nearest_neighbors",
random_state=42,
)
labels = sc.fit_predict(X)
assert labels.shape == y_true.shape
assert np.array_equal(labels, sc.labels_)


def test_spectral_clustering_random_state(clustering_data):
X, _ = clustering_data
sc1 = SpectralClustering(
n_clusters=3,
affinity="nearest_neighbors",
random_state=42,
).fit(X)
sc2 = SpectralClustering(
n_clusters=3,
affinity="nearest_neighbors",
random_state=42,
).fit(X)
assert np.array_equal(sc1.labels_, sc2.labels_), (
"Results should be consistent with the same random_state"
)
11 changes: 10 additions & 1 deletion python/cuml/cuml_accel_tests/test_basic_estimators.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# SPDX-FileCopyrightText: Copyright (c) 2024-2026, NVIDIA CORPORATION.
# SPDX-License-Identifier: Apache-2.0

from sklearn.cluster import DBSCAN, KMeans
from sklearn.cluster import DBSCAN, KMeans, SpectralClustering
from sklearn.datasets import make_blobs, make_classification, make_regression
from sklearn.decomposition import PCA, TruncatedSVD
from sklearn.linear_model import (
Expand Down Expand Up @@ -32,6 +32,15 @@ def test_dbscan():
clf.labels_


def test_spectral_clustering():
X, y_true = make_blobs(n_samples=100, centers=3, random_state=42)
X = X.astype("float32")
sc = SpectralClustering(
n_clusters=3, affinity="nearest_neighbors", random_state=42
).fit(X)
sc.labels_
Comment on lines +35 to +41
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add assertions and remove unused y_true.

Right now the test evaluates sc.labels_ without asserting anything, so it won’t catch incorrect results and triggers Ruff warnings. Consider asserting shape/cluster count and dropping the unused variable.

✅ Suggested fix
 def test_spectral_clustering():
-    X, y_true = make_blobs(n_samples=100, centers=3, random_state=42)
+    X, _ = make_blobs(n_samples=100, centers=3, random_state=42)
     X = X.astype("float32")
     sc = SpectralClustering(
         n_clusters=3, affinity="nearest_neighbors", random_state=42
     ).fit(X)
-    sc.labels_
+    assert sc.labels_.shape == (X.shape[0],)
+    assert len(set(sc.labels_.tolist())) == 3
As per coding guidelines: Test files must validate numerical correctness by comparing with scikit-learn, include edge case coverage (empty datasets, single sample, high-dimensional data), test fit/predict/transform consistency, and test different input types (cuDF, pandas, NumPy).
🧰 Tools
🪛 Ruff (0.15.1)

[warning] 36-36: Unpacked variable y_true is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


[warning] 41-41: Found useless expression. Either assign it to a variable or remove it.

(B018)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml_accel_tests/test_basic_estimators.py` around lines 35 - 41,
The test currently creates data with make_blobs and calls
SpectralClustering().fit(X) but never asserts results and leaves y_true unused;
update test_spectral_clustering to remove the unused y_true or use it for
validation, and add concrete assertions: check sc.labels_.shape ==
(X.shape[0],), assert the number of unique labels equals n_clusters
(len(np.unique(sc.labels_)) == 3), and validate numerical correctness by
comparing to sklearn.cluster.SpectralClustering (e.g., via adjusted_rand_score
between sc.labels_ and sklearn_sc.labels_); additionally add small extra
subtests for edge cases (empty array, single sample, high-dimensional data),
test fit/predict/transform consistency on the same input, and run the same
checks with different input types (NumPy, pandas, cuDF) so the test covers
required behaviors.



def test_pca():
X, _ = make_blobs(n_samples=100, centers=3, random_state=42)
pca = PCA().fit(X)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,7 @@
- "sklearn.tests.test_multioutput::test_multi_target_sparse_regression[lil_matrix]"
- "sklearn.tests.test_public_functions::test_class_wrapper_param_validation[sklearn.cluster.dbscan-sklearn.cluster.DBSCAN]"
- "sklearn.tests.test_public_functions::test_class_wrapper_param_validation[sklearn.cluster.k_means-sklearn.cluster.KMeans]"
- "sklearn.tests.test_public_functions::test_class_wrapper_param_validation[sklearn.cluster.spectral_clustering-sklearn.cluster.SpectralClustering]"
- "sklearn.tests.test_public_functions::test_class_wrapper_param_validation[sklearn.covariance.ledoit_wolf-sklearn.covariance.LedoitWolf]"
- "sklearn.utils.tests.test_estimator_checks::test_check_dataframe_column_names_consistency"
- "sklearn.utils.tests.test_estimator_checks::test_check_estimator"
Expand Down Expand Up @@ -1610,6 +1611,9 @@
tests:
- "sklearn.manifold.tests.test_t_sne::test_fit_transform_csr_matrix[csr_array-random-exact]"
- "sklearn.manifold.tests.test_t_sne::test_fit_transform_csr_matrix[csr_matrix-random-exact]"
- reason: Pipeline precomputed path falls back to CPU while compact path uses GPU, which won't have exact label matches
tests:
- "sklearn.neighbors.tests.test_neighbors_pipeline::test_spectral_clustering"
- reason: Ridge doesn't implement n_iter yet
tests:
- "sklearn.linear_model.tests.test_ridge::test_n_iter"
Expand Down Expand Up @@ -1651,6 +1655,14 @@
- reason: The sklearn test has the error message accidentally flipped, our message is correct
tests:
- "sklearn.linear_model.tests.test_ridge::test_ridge_individual_penalties"
- reason: cuML SpectralClustering does not emit "not fully connected" UserWarning
condition: scikit-learn<1.7,>=1.5
tests:
- "sklearn.cluster.tests.test_spectral::test_affinities"
- reason: cuML SpectralClustering does not emit "not fully connected" UserWarning
condition: scikit-learn>=1.7
tests:
- "sklearn.cluster.tests.test_spectral::test_affinities[42]"
Comment thread
coderabbitai[bot] marked this conversation as resolved.
- reason: cuML TSNE barnes_hut produces poor quality embeddings with sparse input
condition: scikit-learn>=1.8
tests:
Expand Down
19 changes: 18 additions & 1 deletion python/cuml/tests/test_sklearn_import_export.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
from sklearn.utils.validation import check_is_fitted

import cuml
from cuml.cluster import DBSCAN, KMeans
from cuml.cluster import DBSCAN, KMeans, SpectralClustering
from cuml.decomposition import PCA, TruncatedSVD
from cuml.internals.interop import UnsupportedOnCPU, UnsupportedOnGPU
from cuml.linear_model import (
Expand Down Expand Up @@ -194,6 +194,23 @@ def test_dbscan(random_state):
assert array_equal(original.labels_, roundtrip_model.labels_)


def test_spectral_clustering(random_state):
X, _ = make_blobs(
n_samples=100, n_features=10, centers=3, random_state=random_state
)
X = X.astype(np.float32)
original = SpectralClustering(
n_clusters=3,
affinity="nearest_neighbors",
n_neighbors=10,
random_state=random_state,
)
original.fit(X)
sklearn_model = original.as_sklearn()
roundtrip_model = SpectralClustering.from_sklearn(sklearn_model)
assert array_equal(original.labels_, roundtrip_model.labels_)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using assert_estimator_roundtrip? It should offer more robust testing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in e66c72f



def test_pca(random_state):
X = np.random.RandomState(random_state).rand(50, 5)
original = PCA(n_components=2)
Expand Down
Loading