Skip to content

Parallel execution of scikit-learn integration tests broken due to non-deterministic test collection #7055

@csadorf

Description

@csadorf

With the merge of #6866, we introduced a patch to sklearn's all_estimators to prioritize proxy estimators.

Unfortunately, this change also introduced a source of non-determinism in the test collection order, which means that executing scikit-learn integration tests in parallel is broken.

While we don't run tests in parallel in CI, developers cannot reliably run tests in parallel for faster feedback

Example:

./python/cuml/cuml_accel_tests/upstream/scikit-learn/run-tests.sh -n 4 -v

This will likely fail with an eror like Different tests were collected between gw0 and gw2 error due to non-deterministic test ordering.

Expected Behavior

Scikit-learn integration tests should execute deterministically in parallel without ordering issues.

Current Behavior

The order in which tests are collected is currently non-deterministic, which leads to failures when attempting to run the test suite in parallel. In particular, the sequence of parameterized test instances for a given estimator can change from one run to another. For example, tests involving LinearSVC(max_iter=20) and LinearSVC() may be collected in different orders depending on the run. The underlying cause of this issue is not fully understood, but it appears to be connected to the way the all_estimators function is patched—possibly due to the patch not being applied consistently or completely.

Expected Error Output

When running the command above, you'll see an error like:

ERROR collecting gw2
Different tests were collected between gw0 and gw2. The difference is:
--- gw0
+++ gw2
@@ -25704,6 +25704,70 @@
 tests/test_common.py::test_estimators[LinearRegression()-check_fit1d]
 tests/test_common.py::test_estimators[LinearRegression()-check_fit2d_predict1d]
 tests/test_common.py::test_estimators[LinearRegression()-check_requires_y_none]
+tests/test_common.py::test_estimators[LinearSVC()-check_estimator_cloneable0]
+tests/test_common.py::test_estimators[LinearSVC()-check_estimator_cloneable1]
+...
+tests/test_common.py::test_estimators[LinearSVC()-check_requires_y_none]
 tests/test_common.py::test_estimators[LinearSVC(max_iter=20)-check_estimator_cloneable0]
 tests/test_common.py::test_estimators[LinearSVC(max_iter=20)-check_estimator_cloneable1]
 ...
 tests/test_common.py::test_estimators[LinearSVC(max_iter=20)-check_requires_y_none]
-tests/test_common.py::test_estimators[LinearSVC()-check_estimator_cloneable0]
-tests/test_common.py::test_estimators[LinearSVC()-check_estimator_cloneable1]
-...
-tests/test_common.py::test_estimators[LinearSVC()-check_requires_y_none]
 tests/test_common.py::test_estimators[LinearSVR()-check_estimator_cloneable0]

The key issue is that LinearSVC() and LinearSVC(max_iter=20) appear in different orders between test workers, causing pytest-xdist to fail with "Different tests were collected" errors.

Proposed Solutions

  1. Investigate the patch: Review the changes in PR Support LinearSVC and LinearSVR in cuml.accel #6866 to understand what causes the non-determinism
  2. Fix ordering logic: Ensure the proxy estimator prioritization maintains deterministic ordering
  3. Improve monkeypatching reliability: The current patch may not be applied reliably, leading to inconsistent behavior
  4. Consider alternative approaches: Instead of monkeypatching, consider other ways to handle duplicate estimator discovery

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions