The test_random_seed_consistency test is failing with a numerical precision mismatch in cuML's random projection algorithms when using mixed sparse/dense operations. The test verifies that random projection models with identical random seeds produce identical results, but instead encounters an AssertionError due to upstream CuPy limitations in sparse-dense matrix operations.
Failing jobs:
Environment
- CUDA: 12.9.1
- Python: 3.13
- OS: arm64, rockylinux8
- GPU: A100
- Driver: latest-driver
- Dependencies: latest-deps
Test Details
The test fails at the line:
np.testing.assert_allclose(asdense(t1), asdense(t2))
With the error:
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0
Mismatched elements: 1 / 50 (2%)
Max absolute difference among violations: 1.1920929e-07
Max relative difference among violations: 5.94389e-05
The error occurs in the test_random_seed_consistency function when comparing outputs from two identically configured random projection models.
Probable Root Cause
This appears to be a known limitation in CuPy's sparse-dense matrix operations that has been documented in cupy/cupy#9323. The issue occurs when:
- Random projection algorithms perform matrix multiplication between projection components and input data
- Mixed sparse/dense operations (
sparse @ dense or dense @ sparse) are not bit-exact reproducible
- Only
sparse @ sparse or dense @ dense operations produce identical results
- The numerical differences are small (~1e-7) but exceed the default test tolerance
Related Code
The test is located in python/cuml/tests/test_random_projection.py and tests the reproducibility of cuml.random_projection algorithms.
The error propagates from:
test_random_seed_consistency (line 169)
np.testing.assert_allclose with default tolerance rtol=1e-07
- Mixed sparse-dense matrix operations in CuPy
Proposed Mitigation
The issue has been resolved in commit feea7fee9c with the following approach:
@pytest.mark.parametrize("cls", classes)
@pytest.mark.parametrize("sparse", [False, True])
def test_random_seed_consistency(cls, sparse):
X = random_array(10, 1000, sparse=sparse)
model1 = cls(n_components=5, random_state=42).fit(X)
t1 = model1.transform(X)
model2 = cls(n_components=5, random_state=42).fit(X)
t2 = model2.transform(X)
# Due to https://github.com/cupy/cupy/issues/9323 only sparse @ sparse or
# dense @ dense outputs are exactly reproducible. All other combinations
# result in close but not identical outputs. For now we document this and
# relax the test constraint.
if (cls is SparseRandomProjection) != sparse:
# Mix of sparse and dense, check outputs are close
np.testing.assert_allclose(asdense(t1), asdense(t2), rtol=1e-4)
else:
# Both dense or sparse, can check exactly
np.testing.assert_array_equal(asdense(t1), asdense(t2))
Problematic Combinations
SparseRandomProjection + dense input (sparse=False) → Uses relaxed tolerance rtol=1e-4
GaussianRandomProjection + sparse input (sparse=True) → Uses relaxed tolerance rtol=1e-4
The
test_random_seed_consistencytest is failing with a numerical precision mismatch in cuML's random projection algorithms when using mixed sparse/dense operations. The test verifies that random projection models with identical random seeds produce identical results, but instead encounters an AssertionError due to upstream CuPy limitations in sparse-dense matrix operations.Failing jobs:
Environment
Test Details
The test fails at the line:
With the error:
The error occurs in the
test_random_seed_consistencyfunction when comparing outputs from two identically configured random projection models.Probable Root Cause
This appears to be a known limitation in CuPy's sparse-dense matrix operations that has been documented in cupy/cupy#9323. The issue occurs when:
sparse @ denseordense @ sparse) are not bit-exact reproduciblesparse @ sparseordense @ denseoperations produce identical resultsRelated Code
The test is located in
python/cuml/tests/test_random_projection.pyand tests the reproducibility ofcuml.random_projectionalgorithms.The error propagates from:
test_random_seed_consistency(line 169)np.testing.assert_allclosewith default tolerancertol=1e-07Proposed Mitigation
The issue has been resolved in commit
feea7fee9cwith the following approach:Problematic Combinations
SparseRandomProjection+ dense input (sparse=False) → Uses relaxed tolerancertol=1e-4GaussianRandomProjection+ sparse input (sparse=True) → Uses relaxed tolerancertol=1e-4