Speedup dask `LogisticRegression` tests by jcrist · Pull Request #6607 · rapidsai/cuml

jcrist · 2025-04-29T20:31:31Z

This PR applies a few changes (see the commit messages for details) to speedup the dask LogisticRegression tests. Most of the changes fall into one of a few categories:

Removing useless parametrization (either unnecessary for testing the specific feature targeted by the test, or actually ignored and was just doubling the number of tests run)
Reducing the scale tested by a bit
Coupling certain parameter combinations to reduce the number of tests without reducing coverage
Using a faster solver for the CPU versions

All together this reduces the time taken from 28 minutes to 7 minutes on my machine, a 4x speedup.

For I assume historical reasons, most of the dask test suite doesn't run in PRs since it's gated behind quality_param/stress_param annotations. This file is one of the exceptions, and thus takes ~1/2 the time used for a single PR test run. Rather than add those annotations here (I'm mostly against them and hope we can remove them, as discussed in #6580), I've opted to making the tests here more targeted and faster without skipping certain tests in PRs.

No need to parametrize this test based on scale, using a fixed scale should be sufficient.

Using a faster solver for the CPU model results in a 3x speedup.

Using a slightly smaller number of classes results in some measurable speedups.

No need to parametrize these tests based on `n_classes`, the behavior being tested isn't dependent on the number of classes.

Better tests the behavior being tested here.

The behavior's being tested in these tests don't vary based on `fit_intercept`.

This parametrization was unused, it was just doubling the number of tests run without any variance.

betatim

Looks reasonable. Something I haven't thought about is if there are weird numerical effects that we would only see with one of the two dtypes. Which would suggest that we should have (at least) one test for each dtype. But then none of the tests seem to check for "weird numerical" things.

jcrist · 2025-04-30T15:13:34Z

Note that we do still have plenty of tests for each dtype, we're just not running them on every test. Tests for specific behavior check that specific behavior (with minimal unnecessary parametrization). Then there's general tests for each penalty that have a broader parametrization.

jcrist · 2025-04-30T15:14:16Z

/merge

This PR applies a few changes (see the commit messages for details) to speedup the dask `LogisticRegression` tests. Most of the changes fall into one of a few categories: - Removing useless parametrization (either unnecessary for testing the specific feature targeted by the test, or actually ignored and was just doubling the number of tests run) - Reducing the scale tested by a bit - Coupling certain parameter combinations to reduce the number of tests without reducing coverage - Using a faster solver for the CPU versions All together this reduces the time taken from 28 minutes to 7 minutes on my machine, a 4x speedup. For I assume historical reasons, most of the dask test suite doesn't run in PRs since it's gated behind `quality_param`/`stress_param` annotations. This file is one of the exceptions, and thus takes ~1/2 the time used for a single PR test run. Rather than add those annotations here (I'm mostly against them and hope we can remove them, as discussed in rapidsai#6580), I've opted to making the tests here more targeted and faster without skipping certain tests in PRs. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Tim Head (https://github.com/betatim) URL: rapidsai#6607

jcrist added 7 commits April 29, 2025 12:43

Speedup test_standardization_on_scaled_dataset

4b57c3e

No need to parametrize this test based on scale, using a fixed scale should be sufficient.

Speedup test_sparse_nlp20news

aea6c25

Using a faster solver for the CPU model results in a 3x speedup.

Reduce n_classes

755675d

Using a slightly smaller number of classes results in some measurable speedups.

Remove more scaling parametrization

ad77d08

No need to parametrize these tests based on `n_classes`, the behavior being tested isn't dependent on the number of classes.

_Do_ parametrize test_n_classes on n_classes

3784f97

Better tests the behavior being tested here.

Don't parametrize some tests on fit_intercept

727eb6d

The behavior's being tested in these tests don't vary based on `fit_intercept`.

Don't needlessly parametrize test_elasticnet by datatype

db855ef

This parametrization was unused, it was just doubling the number of tests run without any variance.

jcrist requested a review from a team as a code owner April 29, 2025 20:31

jcrist requested review from betatim and csadorf April 29, 2025 20:31

github-actions Bot added the Cython / Python Cython or Python issue label Apr 29, 2025

jcrist self-assigned this Apr 29, 2025

jcrist added tests Unit testing for project Dask / cuml.dask Issue/PR related to Python level dask or cuml.dask features. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 29, 2025

betatim reviewed Apr 30, 2025

View reviewed changes

Comment thread python/cuml/cuml/tests/dask/test_dask_logistic_regression.py

betatim reviewed Apr 30, 2025

View reviewed changes

Comment thread python/cuml/cuml/tests/dask/test_dask_logistic_regression.py

betatim approved these changes Apr 30, 2025

View reviewed changes

rapids-bot Bot merged commit 83f9bcd into rapidsai:branch-25.06 Apr 30, 2025
78 checks passed

jcrist deleted the speedup-dask-lr-tests branch April 30, 2025 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup dask `LogisticRegression` tests#6607

Speedup dask `LogisticRegression` tests#6607
rapids-bot[bot] merged 7 commits intorapidsai:branch-25.06from
jcrist:speedup-dask-lr-tests

jcrist commented Apr 29, 2025

Uh oh!

Uh oh!

Uh oh!

betatim left a comment

Uh oh!

jcrist commented Apr 30, 2025

Uh oh!

jcrist commented Apr 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jcrist commented Apr 29, 2025

Uh oh!

Uh oh!

Uh oh!

betatim left a comment

Choose a reason for hiding this comment

Uh oh!

jcrist commented Apr 30, 2025

Uh oh!

jcrist commented Apr 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants