Spectral Embedding nnz_t#1628
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| coo_to_csr_matrix(handle, n_samples, sym_coo_row_ind.view(), connectivity_graph); | ||
| auto laplacian = | ||
| create_laplacian(handle, spectral_embedding_config, csr_matrix_view, diagonal.view()); | ||
| // raft::print_device_vector("connectivity_graph_vals", connectivity_graph.get_elements().data(), |
There was a problem hiding this comment.
Please don't leave commented out code in your changes
viclafargue
left a comment
There was a problem hiding this comment.
Thanks for working on this.
| create_connectivity_graph(handle, spectral_embedding_config, dataset, sym_coo_matrix); | ||
| auto csr_matrix_view = | ||
| coo_to_csr_matrix<float>(handle, n_samples, sym_coo_row_ind.view(), sym_coo_matrix.view()); | ||
| auto laplacian = | ||
| create_laplacian<float>(handle, spectral_embedding_config, csr_matrix_view, diagonal.view()); | ||
| auto laplacian = create_laplacian<float, raft::device_csr_matrix<float, int, int, int>>( | ||
| handle, spectral_embedding_config, csr_matrix_view, diagonal.view()); | ||
| compute_eigenpairs<float>( | ||
| handle, spectral_embedding_config, n_samples, laplacian, diagonal.view(), embedding); | ||
| handle, spectral_embedding_config, n_samples, laplacian.view(), diagonal.view(), embedding); | ||
| } |
There was a problem hiding this comment.
It looks like the connectivity graph in the transform function that takes the dataset as argument is assumed to have a nnz of type int. Is this intentional? Will it be updated in a follow-up PR?
There was a problem hiding this comment.
Good point, I'll try to change it so that it defaults to int64_t.
Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>
There was a problem hiding this comment.
Looks good for the most part. But, some checks might be necessary.
Could you review the following for risks of integer overflow?
nnz overflow :
thrust::tabulate
n_samples*k_search overflow :
d_indices and d_distances allocation (extents are provided as integers which may cause an overflow internally before allocation)
less likely n_samples overflow :
config.max_iterations
RAFT operators that may use container extents internally for indexing :
- raft::linalg::matrix_vector_op
- raft::matrix::gather
- raft::linalg::unary_op
- raft::matrix::fill
It looks like the coo_to_csr_matrix function is not used anymore. Should we delete it or make it compatible with larger nnz? In this case sym_coo_row_ind would probably have to be of the nnz type.
Also, why do we keep two versions of the function (for the two nnz types)? Is this for legacy support or are there some performance implications? If there is no performance implication we should probably only use the uin64_t nnz in cuML.
| } | ||
|
|
||
| template <typename DataT> | ||
| template <typename DataT, typename A> |
There was a problem hiding this comment.
Please use a more informative name here instead of A.
|
|
||
| CUVS_INST_SPECTRAL_EMBEDDING(float); | ||
| CUVS_INST_SPECTRAL_EMBEDDING(double); | ||
| CUVS_INST_SPECTRAL_EMBEDDING(float, int); |
There was a problem hiding this comment.
Do we need the int instantiations here? Or can we skip them and stick to int64_t only?
There was a problem hiding this comment.
We can keep the int ones to avoid breaking cuml and remove them later.
|
Hi @viclafargue, thanks for the review! I have addressed your Yes, we can remove the |
viclafargue
left a comment
There was a problem hiding this comment.
Thanks for working on this! LGTM
It could be interesting to see if we could handle the edge case of very large n_samples * config.n_components (matrix_vector_op & gather use) maybe as a follow-up PR though.
Hi @viclafargue, so that would mean we need to change the output |
|
/merge |
Resolves #7225, Resolves #6910. Depends on rapidsai/cuvs#1628 This PR pulls in the int64_t support from cuvs to the spectral embedding cuml cpp api. This api is used during UMAP spectral initialization. Authors: - Anupam (https://github.com/aamijar) - Simon Adorf (https://github.com/csadorf) Approvers: - Jinsol Park (https://github.com/jinsolp) - Divye Gala (https://github.com/divyegala) - Victor Lafargue (https://github.com/viclafargue) URL: #7586
Resolves #1243. Depends on rapidsai/raft#2891.
This PR adds a
NNZTypeto the spectral embedding public api with precomputed connectivity graph.The transform api for the precomputed connectivity graph has been switched to use the COO codepath all the way through the algorithm.
The spectral embedding api which passes in a dataset also has been switched to use the COO codepath and use int64_t by default.