Speedup dask LogisticRegression tests#6607
Merged
rapids-bot[bot] merged 7 commits intorapidsai:branch-25.06from Apr 30, 2025
Merged
Speedup dask LogisticRegression tests#6607rapids-bot[bot] merged 7 commits intorapidsai:branch-25.06from
LogisticRegression tests#6607rapids-bot[bot] merged 7 commits intorapidsai:branch-25.06from
Conversation
No need to parametrize this test based on scale, using a fixed scale should be sufficient.
Using a faster solver for the CPU model results in a 3x speedup.
Using a slightly smaller number of classes results in some measurable speedups.
No need to parametrize these tests based on `n_classes`, the behavior being tested isn't dependent on the number of classes.
Better tests the behavior being tested here.
The behavior's being tested in these tests don't vary based on `fit_intercept`.
This parametrization was unused, it was just doubling the number of tests run without any variance.
betatim
reviewed
Apr 30, 2025
betatim
reviewed
Apr 30, 2025
betatim
approved these changes
Apr 30, 2025
Member
betatim
left a comment
There was a problem hiding this comment.
Looks reasonable. Something I haven't thought about is if there are weird numerical effects that we would only see with one of the two dtypes. Which would suggest that we should have (at least) one test for each dtype. But then none of the tests seem to check for "weird numerical" things.
Member
Author
|
Note that we do still have plenty of tests for each dtype, we're just not running them on every test. Tests for specific behavior check that specific behavior (with minimal unnecessary parametrization). Then there's general tests for each penalty that have a broader parametrization. |
Member
Author
|
/merge |
Ofek-Haim
pushed a commit
to Ofek-Haim/cuml
that referenced
this pull request
May 13, 2025
This PR applies a few changes (see the commit messages for details) to speedup the dask `LogisticRegression` tests. Most of the changes fall into one of a few categories: - Removing useless parametrization (either unnecessary for testing the specific feature targeted by the test, or actually ignored and was just doubling the number of tests run) - Reducing the scale tested by a bit - Coupling certain parameter combinations to reduce the number of tests without reducing coverage - Using a faster solver for the CPU versions All together this reduces the time taken from 28 minutes to 7 minutes on my machine, a 4x speedup. For I assume historical reasons, most of the dask test suite doesn't run in PRs since it's gated behind `quality_param`/`stress_param` annotations. This file is one of the exceptions, and thus takes ~1/2 the time used for a single PR test run. Rather than add those annotations here (I'm mostly against them and hope we can remove them, as discussed in rapidsai#6580), I've opted to making the tests here more targeted and faster without skipping certain tests in PRs. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Tim Head (https://github.com/betatim) URL: rapidsai#6607
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR applies a few changes (see the commit messages for details) to speedup the dask
LogisticRegressiontests. Most of the changes fall into one of a few categories:All together this reduces the time taken from 28 minutes to 7 minutes on my machine, a 4x speedup.
For I assume historical reasons, most of the dask test suite doesn't run in PRs since it's gated behind
quality_param/stress_paramannotations. This file is one of the exceptions, and thus takes ~1/2 the time used for a single PR test run. Rather than add those annotations here (I'm mostly against them and hope we can remove them, as discussed in #6580), I've opted to making the tests here more targeted and faster without skipping certain tests in PRs.