Skip to content

Optimize import times a bit#7171

Merged
rapids-bot[bot] merged 6 commits intorapidsai:branch-25.10from
jcrist:import-time-opt
Sep 4, 2025
Merged

Optimize import times a bit#7171
rapids-bot[bot] merged 6 commits intorapidsai:branch-25.10from
jcrist:import-time-opt

Conversation

@jcrist
Copy link
Copy Markdown
Member

@jcrist jcrist commented Sep 3, 2025

An offline conversation sent me down an import time inspection rabbit hole, this PR is the result. I don't think any of the changes here are bad, but also am not sure they're meaningful. Just pushing it up for discussion.

cuml has some dependencies with non-negligible import times. These include:

  • cupy
  • cudf
  • sklearn
  • numpy
  • pandas

In terms of total import time contribution, cudf + pandas are the biggest offenders here. With some work we could delay our cudf & pandas imports until use and shave off ~40% of our import time. If most users aren't using cudf this might be a meaningful body of work to pursue.

The other imports are pretty core to how cuml works and can't really be avoided in any workflow - delaying imports just moves costs from import until use but doesn't reduce it.

This PR instead focuses on limiting the number of things imported from sklearn upon an import cuml call. This drops ~10% of our import time off, bringing us to just under 1 s on my machine.

Changes can be summarized as:

  • Reduce the size of _sklearn_compat.py to just what we actually use. This drops a bunch of sklearn imports, and is the most meaningful contribution to the import time improvements.
  • Delay the metaestimator cuml.pipeline/cuml.model_selection imports until use.
  • Delay a bunch of other sklearn imports until use

Could go either way on merging this, just pushing it up for discussion.

The import time for `_sklearn_compat.py` was non-negligible in the list
of imports we control. Most of that file was unused in cuml, culling out
the compat bits we don't need saved 0.08 s on my machine.
@jcrist jcrist self-assigned this Sep 3, 2025
@jcrist jcrist added the Cython / Python Cython or Python issue label Sep 3, 2025
@jcrist jcrist requested a review from a team as a code owner September 3, 2025 21:29
@jcrist jcrist added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 3, 2025
@jcrist jcrist requested a review from dantegd September 3, 2025 21:29
After this commit, `import cuml` should only import core bits of sklearn
like `import sklearn.base`. Anything algorithm specific is delayed until
first use, speeding import time.
Comment thread python/cuml/cuml/model_selection/__init__.py
@jcrist jcrist requested a review from csadorf September 4, 2025 14:51
Comment thread python/cuml/cuml/_thirdparty/_sklearn_compat.py
Comment thread python/cuml/cuml/model_selection/__init__.py
Comment thread python/cuml/cuml/pipeline/__init__.py Outdated
@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Sep 4, 2025

/merge

@rapids-bot rapids-bot Bot merged commit c9c4e23 into rapidsai:branch-25.10 Sep 4, 2025
73 checks passed
@jcrist jcrist deleted the import-time-opt branch September 4, 2025 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants