Skip to content

[CI] Flaky random seed consistency error in test_random_seed_consistency with SparseRandomProjection #7167

@csadorf

Description

@csadorf

The test_random_seed_consistency test is failing with a numerical precision mismatch in cuML's random projection algorithms when using mixed sparse/dense operations. The test verifies that random projection models with identical random seeds produce identical results, but instead encounters an AssertionError due to upstream CuPy limitations in sparse-dense matrix operations.

Failing jobs:

Environment

  • CUDA: 12.9.1
  • Python: 3.13
  • OS: arm64, rockylinux8
  • GPU: A100
  • Driver: latest-driver
  • Dependencies: latest-deps

Test Details

The test fails at the line:

np.testing.assert_allclose(asdense(t1), asdense(t2))

With the error:

AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0

Mismatched elements: 1 / 50 (2%)
Max absolute difference among violations: 1.1920929e-07
Max relative difference among violations: 5.94389e-05

The error occurs in the test_random_seed_consistency function when comparing outputs from two identically configured random projection models.

Probable Root Cause

This appears to be a known limitation in CuPy's sparse-dense matrix operations that has been documented in cupy/cupy#9323. The issue occurs when:

  1. Random projection algorithms perform matrix multiplication between projection components and input data
  2. Mixed sparse/dense operations (sparse @ dense or dense @ sparse) are not bit-exact reproducible
  3. Only sparse @ sparse or dense @ dense operations produce identical results
  4. The numerical differences are small (~1e-7) but exceed the default test tolerance

Related Code

The test is located in python/cuml/tests/test_random_projection.py and tests the reproducibility of cuml.random_projection algorithms.

The error propagates from:

  1. test_random_seed_consistency (line 169)
  2. np.testing.assert_allclose with default tolerance rtol=1e-07
  3. Mixed sparse-dense matrix operations in CuPy

Proposed Mitigation

The issue has been resolved in commit feea7fee9c with the following approach:

@pytest.mark.parametrize("cls", classes)
@pytest.mark.parametrize("sparse", [False, True])
def test_random_seed_consistency(cls, sparse):
    X = random_array(10, 1000, sparse=sparse)
    
    model1 = cls(n_components=5, random_state=42).fit(X)
    t1 = model1.transform(X)
    model2 = cls(n_components=5, random_state=42).fit(X)
    t2 = model2.transform(X)
    
    # Due to https://github.com/cupy/cupy/issues/9323 only sparse @ sparse or
    # dense @ dense outputs are exactly reproducible. All other combinations
    # result in close but not identical outputs. For now we document this and
    # relax the test constraint.
    if (cls is SparseRandomProjection) != sparse:
        # Mix of sparse and dense, check outputs are close
        np.testing.assert_allclose(asdense(t1), asdense(t2), rtol=1e-4)
    else:
        # Both dense or sparse, can check exactly
        np.testing.assert_array_equal(asdense(t1), asdense(t2))

Problematic Combinations

  • SparseRandomProjection + dense input (sparse=False) → Uses relaxed tolerance rtol=1e-4
  • GaussianRandomProjection + sparse input (sparse=True) → Uses relaxed tolerance rtol=1e-4

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingciflaky

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions