PCA preprocessor#1808
Conversation
| raft::device_vector_view<float, int64_t> mu, | ||
| raft::device_scalar_view<float, int64_t> noise_vars, | ||
| bool flip_signs_based_on_U = false); | ||
|
|
There was a problem hiding this comment.
Making a note here that I don't think the existing cuml implementation has the ability to tune the percentage of explained variance.
For example, in sklearn we can set 0 < n_components < 1 where the user can select a percentage of the explained variance to recover and the n_components is automatically determined by the algorithm in order to satisfy that.
We will have to build that piece out since it doesn't exist in the current implementation.
There was a problem hiding this comment.
Tuning is not what's being asked for. Exposing the explained variance is what's being requested (it is used for tuning / selecting the number of components but that's something the user does, not something we need to do).
| prms.copy = config.copy; | ||
| prms.whiten = config.whiten; | ||
| return prms; |
There was a problem hiding this comment.
| prms.copy = config.copy; | |
| prms.whiten = config.whiten; | |
| return prms; | |
| prms.copy = config.copy; | |
| prms.whiten = config.whiten; | |
| prms.verbose = config.verbose; | |
| return prms; |
We are missing verbose here, no?
There was a problem hiding this comment.
verbose was a unused parameter, removed it from pca.hpp in 99f32fc
| // prms.n_cols = params.n_col; | ||
| // prms.n_rows = params.n_row; |
| * @param[in] flip_signs_based_on_U whether to determine signs by U (true) or V.T (false) | ||
| */ | ||
| void fit(raft::resources const& handle, | ||
| params config, |
There was a problem hiding this comment.
Small thing but important for API consistency: params config is passed by value here and in all other functions in the PR. The cuVS convention afaik is const params& for example in kmeans::fit:
void fit(raft::resources const& handle,
const cuvs::cluster::kmeans::params& params,
raft::device_matrix_view<const float, int> X,
std::optional<raft::device_vector_view<const float, int>> sample_weight,
raft::device_matrix_view<float, int> centroids,
raft::host_scalar_view<float> inertia,
raft::host_scalar_view<int> n_iter);This won't affect performance or correctness at all, but for consistency I would suggest changing to const params& config throughout. Applies to all 8 overloads in this header plus the detail layer.
| } | ||
|
|
||
| protected: | ||
| void basicTest() |
There was a problem hiding this comment.
Both basicTest and advancedTest set n_components = n_col, so the inverse transform should perfectly reconstruct the input.
Consider adding a test with n_components < n_col that verifies the reconstruction error is bounded but non-zero, this would confirm the dimensionality reduction is actually working, not just passing data through unchanged and be a useful test case.
divyegala
left a comment
There was a problem hiding this comment.
Why were double instantiations needed? Where is the code intended to be used?
Co-authored-by: Dante Gama Dessavre <[email protected]>
Co-authored-by: Dante Gama Dessavre <[email protected]>
The cuml pca python interface supports double inputs. However, cuml will use the raft api, so therefore cuvs does not need double instantiations if we think its not valuable. |
In that case, please remove |
52894e3 to
465e546
Compare
Comments addressed. Dismissing while Dante on PTO.
|
/merge |
cede915
into
rapidsai:release/26.04
Resolves rapidsai#1207. Depends on rapidsai/raft#2952 This PR introduces the `cuvs::preprocessing::pca` with `float` support. The following APIs are supported: `fit`, `transform`, `fit_transform`, `inverse_transform`. Authors: - Anupam (https://github.com/aamijar) - Divye Gala (https://github.com/divyegala) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#1808
Resolves rapidsai#1207. Depends on rapidsai/raft#2952 This PR introduces the `cuvs::preprocessing::pca` with `float` support. The following APIs are supported: `fit`, `transform`, `fit_transform`, `inverse_transform`. Authors: - Anupam (https://github.com/aamijar) - Divye Gala (https://github.com/divyegala) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#1808
Resolves rapidsai#1207. Depends on rapidsai/raft#2952 This PR introduces the `cuvs::preprocessing::pca` with `float` support. The following APIs are supported: `fit`, `transform`, `fit_transform`, `inverse_transform`. Authors: - Anupam (https://github.com/aamijar) - Divye Gala (https://github.com/divyegala) Approvers: - Divye Gala (https://github.com/divyegala) URL: rapidsai#1808
Resolves #1207. Depends on rapidsai/raft#2952
This PR introduces the
cuvs::preprocessing::pcawithfloatsupport. The following APIs are supported:fit,transform,fit_transform,inverse_transform.