Spectral Embedding argument `affinity={"precomputed", "nearest_neighbors"}` by aamijar · Pull Request #7117 · rapidsai/cuml

aamijar · 2025-08-14T23:30:57Z

Resolves #7081

viclafargue

Thanks a lot @aamijar! It's looking good. Could you update the docstrings? Also here is a number of small change requests.

viclafargue · 2025-08-15T08:02:53Z

+               raft::device_vector_view<int, int> rows,
+               raft::device_vector_view<int, int> cols,
+               raft::device_vector_view<float, int> vals,
+               raft::device_matrix_view<float, int, raft::col_major> embedding);


It looks like the API won't work with datasets having more elements (nnz) than std::numeric_limits<int>::max. Would be great to update the cuVS and cuML APIs to allow larger matrices (extent as uint64_t). Maybe as a follow-up PR?

Yes, tracking here rapidsai/cuvs#1243.

viclafargue · 2025-08-15T08:08:18Z

+        rows = A.row
+        cols = A.col
+        vals = A.data
+        n_samples = A.shape[0]
+        nnz = A.nnz
+
+        rows = input_to_cuml_array(rows, order="C",
+                                   check_dtype=np.int32, convert_to_dtype=cp.int32)[0]
+        cols = input_to_cuml_array(cols, order="C",
+                                   check_dtype=np.int32, convert_to_dtype=cp.int32)[0]
+        vals = input_to_cuml_array(vals, order="C",
+                                   check_dtype=np.float32, convert_to_dtype=cp.float32)[0]


Could you add a check to ensure that this is a COO matrix and maybe convert otherwise. You could maybe reuse the extract_knn_graph function. Additionally asserts on the length of the arrays would be nice to have too.

Added the assert here ddf2a26. In what case would we need to convert?

If the user provides other sparse formats than COO and maybe even a dense pre-computed graph.

Addressed in 706b843

viclafargue · 2025-08-15T08:10:40Z

+            input_to_cuml_array(A, order="C", check_dtype=np.float32,
+                                convert_to_dtype=cp.float32)
+        A_ptr = <uintptr_t>A.ptr
+        n_samples = A.shape[0]


Isn't n_samples the same as _n_rows? Safer to avoid accessing with the shape attribute and leave it to the input_to_cuml_array function to determine the number of samples.

Addressed in 63f7fa6 and ddf2a26

viclafargue · 2025-08-15T08:15:39Z

+        transform(
+            deref(h), config,
+            make_device_matrix_view[float, int, row_major](
+                <float *>A_ptr, <int> n_samples, <int> A.shape[1]),


Please use _n_cols rather than A.shape[1].

Addressed in 63f7fa6

viclafargue · 2025-08-15T08:19:46Z

 def test_spectral_embedding_trustworthiness(
-    dataset_loader, n_samples, min_trustworthiness
+    dataset_loader, n_samples, affinity
 ):


Would be great to quickly check if it also behave as expected with a smooth KNN such as one produced by the fuzzy_simplicial_set function.

Addressed in 9373046

viclafargue · 2025-08-19T08:46:01Z

+    [
+        ("nearest_neighbors", None),  # Use built-in nearest_neighbors affinity
+        ("precomputed", "binary_knn"),  # Precomputed binary k-NN graph
+        ("precomputed", "fuzzy_knn"),  # Precomputed fuzzy k-NN graph from UMAP
+    ],


Could we also add ("precomputed", "regular_knn") with mode="distance" to check that it is as good as ("nearest_neighbors", None).

Addressed in ba3601d

viclafargue · 2025-08-19T08:47:15Z

+                affinity="precomputed",
+                random_state=42,
+            )
+            X_sklearn = sk_spectral.fit_transform(graph_dense.get())


Doesn't the Scikit-Learn implementation handle sparse arrays here?

Yes, addressed here 4560a49

viclafargue

Thanks @aamijar! LGTM, just two small comments.

viclafargue · 2025-08-25T16:53:35Z

+        # Use deepcopy=True to ensure we don't modify the original arrays
+        rows = input_to_cuml_array(rows, order="C", deepcopy=True,
+                                   check_dtype=np.int32, convert_to_dtype=cp.int32)[0]
+        cols = input_to_cuml_array(cols, order="C", deepcopy=True,
+                                   check_dtype=np.int32, convert_to_dtype=cp.int32)[0]
+        vals = input_to_cuml_array(vals, order="C", deepcopy=True,
+                                   check_dtype=np.float32, convert_to_dtype=cp.float32)[0]


Does the C++ side updates the input COO matrix?

Yes, I think its because we are doing coo_sort in place on the input view.

It'd be nice to be able to avoid these copies here.

If we could move the sorting out to be handled by the caller (the creation of the initial coo here should already do that) that'd be cleaner IMO.

If we can't, then I think avoiding the copy is still fine. Sorting is a canonicalization step (the same one that cupyx.scipy.sparse will do). A mutation like that won't make an input coo matrix invalid, and should be fine IMO. Still have a preference to move the sorting out of the routine though.

Hmm, something fishy is happening if I remove the copying. I was running into this last week as well. It happens when the input sparse matrix is csc specifically. So the flow is that the user passes in the csc sparse matrix and then it gets converted to a coo matrix in place in the python code. Then the cpp code also performs a coo_sort operation. This corrupts the original input data. So in the pytest the second call to spectral_embedding fails since the input was modified.

I already tried sorting in the python side and removing the coo_sort in the cpp side. But that didn't work for the csc input.

Addressed in 884e282.
I fixed it by handling csc input with copying. I am also doing the sorting in python side. I confirmed the csc to coo and sorting in python actually also corrupts the input data so we need to copy.

viclafargue · 2025-08-25T16:54:26Z

+    # Handle scipy sparse matrices
+    if scipy_issparse(A):
+        return A.tocoo()
+
+    # Handle cupy sparse matrices
+    if cupy_issparse(A):
+        return A.tocoo()


sort_indices should guarantee order for CSR/CSC. We should probably update the extract_knn_graph function too, maybe in an other PR.

Which is the sort_indices part you are referring to?

https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.sort_indices.html

Okay, does that get called as part of the .tocoo(). Is it a problem, or I am not sure what to update.

Actually nevermind, we should still get contiguous row blocks without sorting and that should be fine.

jcrist · 2025-08-25T17:15:17Z

+        # Use deepcopy=True to ensure we don't modify the original arrays
+        rows = input_to_cuml_array(rows, order="C", deepcopy=True,
+                                   check_dtype=np.int32, convert_to_dtype=cp.int32)[0]
+        cols = input_to_cuml_array(cols, order="C", deepcopy=True,
+                                   check_dtype=np.int32, convert_to_dtype=cp.int32)[0]
+        vals = input_to_cuml_array(vals, order="C", deepcopy=True,
+                                   check_dtype=np.float32, convert_to_dtype=cp.float32)[0]


It'd be nice to be able to avoid these copies here.

If we could move the sorting out to be handled by the caller (the creation of the initial coo here should already do that) that'd be cleaner IMO.

If we can't, then I think avoiding the copy is still fine. Sorting is a canonicalization step (the same one that cupyx.scipy.sparse will do). A mutation like that won't make an input coo matrix invalid, and should be fine IMO. Still have a preference to move the sorting out of the routine though.

- Simplify affinity matrix input handling - Raise error on invalid `affinity` value - A few cleanups to docstrings and tests

jcrist

Fixes a bug where non-float32 cupy sparse matrices were mishandled. Adds a test for float64 inputs across all input types.

jcrist · 2025-08-26T23:41:43Z

/merge

precomputed SpectralEmbedding input

01cb5cb

aamijar requested review from a team as code owners August 14, 2025 23:30

aamijar requested review from jcrist, teju85 and viclafargue August 14, 2025 23:30

github-actions Bot assigned aamijar Aug 14, 2025

github-actions Bot added Cython / Python Cython or Python issue CUDA/C++ labels Aug 14, 2025

aamijar removed the request for review from teju85 August 14, 2025 23:31

aamijar added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Aug 14, 2025

aamijar changed the title ~~Spectral Embedding precomputed argument affinity="precomputed"~~ Spectral Embedding argument affinity={"precomputed", "nearest_neighbors"} Aug 14, 2025

aamijar added 2 commits August 14, 2025 23:41

refactor tests

50ccec0

refactor tests

8d8a904

viclafargue requested changes Aug 15, 2025

View reviewed changes

aamijar and others added 5 commits August 15, 2025 23:06

refactor

63f7fa6

Merge branch 'branch-25.10' into precomputed-spectral-embedding

4df79b5

refactor

ddf2a26

fuzzy_simplicial_set testing

9373046

rename import

19c0d11

viclafargue reviewed Aug 19, 2025

View reviewed changes

aamijar and others added 5 commits August 20, 2025 06:32

distance_knn mode test

ba3601d

Merge branch 'branch-25.10' into precomputed-spectral-embedding

83951a4

do not convert to dense

4560a49

support other sparse formats for precomputed

706b843

Merge branch 'branch-25.10' into precomputed-spectral-embedding

2b5bbd6

viclafargue approved these changes Aug 25, 2025

View reviewed changes

Merge branch 'branch-25.10' into precomputed-spectral-embedding

42d4eae

jcrist reviewed Aug 25, 2025

View reviewed changes

aamijar and others added 3 commits August 26, 2025 01:50

avoid deepcopy

884e282

Merge branch 'branch-25.10' into precomputed-spectral-embedding

9873ab1

Fixups

bc838e3

- Simplify affinity matrix input handling - Raise error on invalid `affinity` value - A few cleanups to docstrings and tests

jcrist approved these changes Aug 26, 2025

View reviewed changes

aamijar commented Aug 26, 2025

View reviewed changes

Comment thread python/cuml/cuml/manifold/spectral_embedding.pyx

Support non-float32 inputs for spectral_embedding

1942d7d

Fixes a bug where non-float32 cupy sparse matrices were mishandled. Adds a test for float64 inputs across all input types.

rapids-bot Bot merged commit c2de322 into rapidsai:branch-25.10 Aug 26, 2025
76 checks passed

Conversation

aamijar commented Aug 14, 2025

Uh oh!

viclafargue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viclafargue left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aamijar Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aamijar Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aamijar Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jcrist left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jcrist commented Aug 26, 2025

Uh oh!

Uh oh!

aamijar Aug 25, 2025 •

edited

Loading

aamijar Aug 26, 2025 •

edited

Loading

aamijar Aug 26, 2025 •

edited

Loading