-
Notifications
You must be signed in to change notification settings - Fork 623
Improve detection of scikit-learn parity regressions #6553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
rapids-bot
merged 8 commits into
rapidsai:branch-25.06
from
csadorf:ci/accel-skl-regression-testing
Apr 25, 2025
Merged
Changes from 4 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
2cf4656
Add scikit-learn parity regression detection infrastructure
csadorf 81b02c2
Run scikit-learn test suite in parallel.
csadorf 58f24eb
Update xfail list.
csadorf d014a21
Remove obsolete scripts.
csadorf c3a0c07
Add pyyaml to test dependencies.
csadorf feae07e
Use more specific xfail reason.
csadorf b9bdf32
Merge branch 'branch-25.06' into ci/accel-skl-regression-testing
csadorf 54397a5
Update xfail list.
csadorf File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| # scikit-learn Acceleration Tests | ||
|
|
||
| This suite provides infrastructure to run and analyze tests for scikit-learn with cuML acceleration support. | ||
|
|
||
| ## Components | ||
|
|
||
| - `run-tests.sh` | ||
| Executes scikit-learn tests using GPU-accelerated paths. Any arguments passed to the script are forwarded directly to pytest. | ||
|
|
||
| Example usage: | ||
| ```bash | ||
| ./run-tests.sh # Run all tests | ||
| ./run-tests.sh -v -k test_kmeans # Run specific test with verbosity | ||
| ./run-tests.sh -x --pdb # Stop on first failure and debug | ||
| ``` | ||
|
|
||
| - `summarize-results.py` | ||
| Analyzes test results from an XML report file and prints a summary or generates an xfail list. | ||
| Options: | ||
| - `-v, --verbose` : Display detailed failure information | ||
| - `-f, --fail-below VALUE` : Set a minimum pass rate threshold (0-100) | ||
| - `--format FORMAT` : Output format (summary or xfail_list) | ||
| - `--update-xfail-list PATH` : Path to existing xfail list to update | ||
| - `-i, --in-place` : Update the xfail list file in place | ||
| - `--xpassed ACTION` : How to handle XPASS tests (keep/remove/mark-flaky) | ||
|
|
||
| ## Usage | ||
|
|
||
| ### 1. Run tests and generate report | ||
| Run tests and save the report: | ||
| ```bash | ||
| ./run-tests.sh --junitxml=report.xml | ||
| ``` | ||
|
|
||
| **Tip**: Run tests in parallel with `-n auto` to use all available CPU cores: | ||
| ```bash | ||
| ./run-tests.sh --junitxml=report.xml -n auto | ||
| ``` | ||
|
|
||
| ### 2. Analyze results | ||
| Generate a summary from the report: | ||
| ```bash | ||
| ./summarize-results.py -v -f 80 report.xml | ||
| ``` | ||
|
|
||
| ## Xfail List | ||
|
|
||
| The xfail list (`xfail-list.yaml`) is used to mark tests that are expected to fail. This is useful for: | ||
| - Tracking known issues | ||
| - Managing test failures during development | ||
| - Handling version-specific test failures | ||
| - Managing flaky tests that occasionally fail | ||
|
|
||
| ### Automatic Usage | ||
| The `run-tests.sh` script automatically uses an `xfail-list.yaml` file if present in the same directory. | ||
|
|
||
| ### Generating an Xfail List | ||
| The `summarize-results.py` script provides several ways to manage the xfail list: | ||
|
|
||
| 1. Generate a new xfail list from test results: | ||
| ```bash | ||
| ./summarize-results.py --format=xfail_list report.xml > xfail-list.yaml | ||
| ``` | ||
|
|
||
| 2. Update an existing xfail list (in place): | ||
| ```bash | ||
| ./summarize-results.py --update-xfail-list=xfail-list.yaml --in-place report.xml | ||
| ``` | ||
|
|
||
| The script handles XPASS tests in three ways (controlled by `--xpassed`): | ||
| - `keep`: Preserve all xpassed tests in the list (default) | ||
| - `remove`: Remove xpassed tests from the list | ||
| - `mark-flaky`: Convert strict xpassed tests to non-strict (flaky) | ||
|
|
||
| Example with all options: | ||
| ```bash | ||
| ./summarize-results.py --update-xfail-list=xfail-list.yaml --in-place --xpassed=mark-flaky report.xml | ||
| ``` | ||
|
|
||
| ### Format | ||
| The xfail list is a YAML file containing test IDs to mark as xfail. Each entry can include: | ||
| - `id`: Test ID in format "module::test_name" | ||
| - `reason`: Optional reason for xfail (default: "Test listed in xfail list") | ||
| - `strict`: Whether to enforce xfail (default: true) | ||
| - `condition`: Optional version requirement (e.g., "scikit-learn>=1.5.2") | ||
|
|
||
| Example: | ||
| ```yaml | ||
| - id: "sklearn.linear_model.tests.test_logistic::test_logistic_regression" | ||
| reason: "Known issue with sparse inputs" | ||
| strict: true | ||
| - id: "sklearn.cluster.tests.test_k_means::test_kmeans_convergence[42-elkan]" | ||
| condition: "scikit-learn<1.5.2" | ||
| reason: "Unsupported hyperparameter for older scikit-learn version." | ||
| - id: "sklearn.ensemble.tests.test_forest::test_random_forest_classifier" | ||
| reason: "Flaky test due to random seed sensitivity" | ||
| strict: false | ||
| ``` | ||
|
|
||
| **Note on `strict: false`**: | ||
| The `strict` flag should be set to `true` by default. Use `strict: false` only for: | ||
| - Tests that are genuinely non-deterministic (e.g., due to floating-point arithmetic) | ||
| - Tests that fail intermittently due to external factors (e.g., network timeouts) | ||
| - Tests that are known to be flaky but cannot be fixed immediately | ||
|
|
||
| Ideally, Each use of `strict: false` should include: | ||
| - A clear explanation of why the test is non-deterministic | ||
| - A plan to fix the underlying issue | ||
| - Regular review to ensure the flag is still necessary |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.