Skip to content

Spectral Embedding nnz_t#1628

Merged
rapids-bot[bot] merged 18 commits intorapidsai:mainfrom
aamijar:spectral-embedding-nnz
Jan 13, 2026
Merged

Spectral Embedding nnz_t#1628
rapids-bot[bot] merged 18 commits intorapidsai:mainfrom
aamijar:spectral-embedding-nnz

Conversation

@aamijar
Copy link
Copy Markdown
Member

@aamijar aamijar commented Dec 10, 2025

Resolves #1243. Depends on rapidsai/raft#2891.

This PR adds a NNZType to the spectral embedding public api with precomputed connectivity graph.
The transform api for the precomputed connectivity graph has been switched to use the COO codepath all the way through the algorithm.

The spectral embedding api which passes in a dataset also has been switched to use the COO codepath and use int64_t by default.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Dec 10, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@aamijar aamijar self-assigned this Dec 10, 2025
@aamijar aamijar moved this from Todo to In Progress in Unstructured Data Processing Dec 10, 2025
@aamijar aamijar added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Dec 10, 2025
coo_to_csr_matrix(handle, n_samples, sym_coo_row_ind.view(), connectivity_graph);
auto laplacian =
create_laplacian(handle, spectral_embedding_config, csr_matrix_view, diagonal.view());
// raft::print_device_vector("connectivity_graph_vals", connectivity_graph.get_elements().data(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't leave commented out code in your changes

@aamijar aamijar marked this pull request as ready for review January 7, 2026 06:21
@aamijar aamijar requested a review from a team as a code owner January 7, 2026 06:21
@aamijar aamijar requested a review from a team as a code owner January 7, 2026 06:23
Comment thread cpp/cmake/thirdparty/get_raft.cmake Outdated
Copy link
Copy Markdown
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

Comment thread cpp/src/preprocessing/spectral/detail/spectral_embedding.cuh Outdated
Comment on lines 264 to 271
create_connectivity_graph(handle, spectral_embedding_config, dataset, sym_coo_matrix);
auto csr_matrix_view =
coo_to_csr_matrix<float>(handle, n_samples, sym_coo_row_ind.view(), sym_coo_matrix.view());
auto laplacian =
create_laplacian<float>(handle, spectral_embedding_config, csr_matrix_view, diagonal.view());
auto laplacian = create_laplacian<float, raft::device_csr_matrix<float, int, int, int>>(
handle, spectral_embedding_config, csr_matrix_view, diagonal.view());
compute_eigenpairs<float>(
handle, spectral_embedding_config, n_samples, laplacian, diagonal.view(), embedding);
handle, spectral_embedding_config, n_samples, laplacian.view(), diagonal.view(), embedding);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the connectivity graph in the transform function that takes the dataset as argument is assumed to have a nnz of type int. Is this intentional? Will it be updated in a follow-up PR?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll try to change it so that it defaults to int64_t.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 30ebe8e

Copy link
Copy Markdown
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the most part. But, some checks might be necessary.

Could you review the following for risks of integer overflow?

nnz overflow :
thrust::tabulate

n_samples*k_search overflow :
d_indices and d_distances allocation (extents are provided as integers which may cause an overflow internally before allocation)

less likely n_samples overflow :
config.max_iterations

RAFT operators that may use container extents internally for indexing :

  • raft::linalg::matrix_vector_op
  • raft::matrix::gather
  • raft::linalg::unary_op
  • raft::matrix::fill

It looks like the coo_to_csr_matrix function is not used anymore. Should we delete it or make it compatible with larger nnz? In this case sym_coo_row_ind would probably have to be of the nnz type.

Also, why do we keep two versions of the function (for the two nnz types)? Is this for legacy support or are there some performance implications? If there is no performance implication we should probably only use the uin64_t nnz in cuML.

}

template <typename DataT>
template <typename DataT, typename A>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a more informative name here instead of A.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 3d962a3


CUVS_INST_SPECTRAL_EMBEDDING(float);
CUVS_INST_SPECTRAL_EMBEDDING(double);
CUVS_INST_SPECTRAL_EMBEDDING(float, int);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the int instantiations here? Or can we skip them and stick to int64_t only?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the int ones to avoid breaking cuml and remove them later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sounds good.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking here #1695

@aamijar
Copy link
Copy Markdown
Member Author

aamijar commented Jan 10, 2026

Hi @viclafargue, thanks for the review! I have addressed your int overflow concerns in c81cb15.

Yes, we can remove the coo_to_csr since that is no longer used. I was keeping it around for debugging purposes. Removed in 54e3133
Yes, we can only instantiate the int64_t nnz type functions and drop int but we should keep it for now to avoid breaking cuml.

Copy link
Copy Markdown
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! LGTM

It could be interesting to see if we could handle the edge case of very large n_samples * config.n_components (matrix_vector_op & gather use) maybe as a follow-up PR though.

@aamijar aamijar removed the request for review from a team January 12, 2026 22:09
@aamijar
Copy link
Copy Markdown
Member Author

aamijar commented Jan 12, 2026

It could be interesting to see if we could handle the edge case of very large n_samples * config.n_components (matrix_vector_op & gather use) maybe as a follow-up PR though.

Hi @viclafargue, so that would mean we need to change the output embedding to be indexed with int64_t right? I think we can address that in a follow up PR. Lanczos solver uses int for embedding output currently.

@aamijar
Copy link
Copy Markdown
Member Author

aamijar commented Jan 13, 2026

/merge

@rapids-bot rapids-bot Bot merged commit 67fe5a0 into rapidsai:main Jan 13, 2026
190 of 193 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Unstructured Data Processing Jan 13, 2026
rapids-bot Bot pushed a commit to rapidsai/cuml that referenced this pull request Jan 15, 2026
Resolves #7225, Resolves #6910.
Depends on rapidsai/cuvs#1628

This PR pulls in the int64_t support from cuvs to the spectral embedding cuml cpp api. This api is used during UMAP spectral initialization.

Authors:
  - Anupam (https://github.com/aamijar)
  - Simon Adorf (https://github.com/csadorf)

Approvers:
  - Jinsol Park (https://github.com/jinsolp)
  - Divye Gala (https://github.com/divyegala)
  - Victor Lafargue (https://github.com/viclafargue)

URL: #7586
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Support nnz_t for spectral_embedding

4 participants