Summary
The cuml.accel StandardScaler currently produces incorrect results when scaling near-constant features (features with very low variance). This affects numerical correctness once StandardScaler is auto-accelerated.
Problem
Test failures in test_standard_scaler_near_constant_features show that the GPU implementation lacks proper handling for features with variance close to zero (tested with values like 1e-10, 1, and 10000000000.0 at various sample sizes).
Requested Solution
Either:
- (a) Match sklearn's stability handling: Implement epsilon/variance thresholding when computing scale, matching sklearn's behavior for near-zero variance columns
- (b) Detect and force CPU fallback: Identify near-constant features during fit and dispatch to sklearn's CPU implementation (similar to other unsupported cases like partial_fit or sample_weight)
Context
This issue was identified during PR #7766 which adds cuml.accel support for StandardScaler.
Acceptance Criteria
Requested by: @csadorf
Summary
The cuml.accel StandardScaler currently produces incorrect results when scaling near-constant features (features with very low variance). This affects numerical correctness once StandardScaler is auto-accelerated.
Problem
Test failures in
test_standard_scaler_near_constant_featuresshow that the GPU implementation lacks proper handling for features with variance close to zero (tested with values like 1e-10, 1, and 10000000000.0 at various sample sizes).Requested Solution
Either:
Context
This issue was identified during PR #7766 which adds cuml.accel support for StandardScaler.
Acceptance Criteria
test_standard_scaler_near_constant_featurespass without xfailsRequested by: @csadorf