Deprecate handle from public APIs by jcrist · Pull Request #7628 · rapidsai/cuml

jcrist · 2025-12-18T21:14:32Z

This deprecates the handle argument to all classes, methods, and functions.

cuml.Handle is a bit of a relic, and doesn't necessarily provide a solid user-facing benefit. A few things it was purported to do:

Allow for asynchronous execution

All cuml python APIs are synchronous, we do not support asynchronous execution. Any docs or claims that this is currently possible are incorrect. Further, the python apis we're mimicking and the ecosystem we're extending don't support asynchronous execution. cupy (which we make heavy use of in cuml) does have some mechanisms for asynchronous execution - if/when we want to support asynchronous operation, we're likely to take an approach that piggybacks off their configuration (for better ecosystem support) than rely on the existing cuml.Handle class that doesn't really match the task. Handles are for specifying and caching resources, not for configuring sync/async operation.

Specify the stream of execution

This was possible with cuml.Handle, but didn't provide a meaningful benefit. cuml apis are all synchronous, so the stream they execute on doesn't matter. Further, not every function or estimator that accepted a cuml.Handle made use of the handle (or respected the stream), making this specification kind of moot. cuml.Handle is really only used for functions part of libcuml - anything written using cupy/cudf/etc... ignores them completely.

Given we provide synchronous APIs only (and rely on threads for concurrency, matching python conventions), it doesn't make sense to expose this at the user level necessarily. Better to make it an implementation detail of APIs that rely on libcuml.

Specify the number of streams in a backing stream pool

A few of our algorithms support using multiple streams from a pool on the handle. In some cases (cuml.ensemble) we also exposed a top-level n_streams argument, which seems preferable. I've added this to LinearSVC (the only other algorithm that uses this AFAICT). Elevating the n_streams to a top-level parameter makes them more discoverable by users, and also lets us avoid modifying or constructing a handle within the __init__, better following sklearn conventions.

Specify a DeviceResourcesSNMG

This is a relatively new use case currently only supported by HDBSCAN and UMAP. Some of our algorithms support running on multiple GPUs on a single node (when configured). Previously this was supported by passing in a pylibraft.common.handle.DeviceResourcesSNMG instead of a pylibraft.common.handle.Handle. There were several problems with that though (see #7465, #7059). Instead, we now elevate device_ids to a top-level parameter for these models. Like with n_streams, this better elevates this feature, and keeps our configuration and __init__ simpler. DeviceResourcesSNMG is now an implementation detail, not something user-facing.

Hold other resources to share across calls

A Handle contains many resources (cusolver handles, cublas handles, ...), which are created lazily and may have some costs to initialize. The previous model would result in a unique Handle per estimator or function call by default, preventing sharing these initialization costs. Expert users may create a handle once and pass it around, but that requires extra plumbing on their part with no real other benefit. Instead, we now cache a handle per thread (some resources have concurrency or thread limitations, with python's thread-based concurrency model keeping things thread local prevents any issues on that front). For single threaded programs only a single Handle will now be created, letting us avoid any repeated init costs.

Given the above arguments, Handle and DeviceResourcesSNMG are now implementation details, and are no longer user facing. To accomplish this, this PR:

Modifies all relevant estimators and functions to warn if a user specifies a handle. During the deprecation cycle the specified handle will continue to be used. The warning informs the user the parameter is deprecated, and if the estimator supports n_streams/device_ids will also include a note to use that instead.
The internal *MG estimators (e.g. LogisticRegressionMG) still do include a handle parameter, since for now the distributed comms are also attached to that object and need to be provided somehow by the caller. Since these are internal(ish) classes, I'm ok with keeping the handle parameter around on them for now. In the long run we probably will want to rethink our multi-gpu APIs since the *MG classes are a bit unwieldly, but no sense changing them for now.
All estimators that make use of a handle's stream pool now include an n_streams parameter for configuring the pool size. Users are recommended to use that by the docs and deprecation warnings.
All estimators that support multi-gpu execution now include a device_ids parameter for configuring the devices used. Users are recommended to use that by the docs and deprecation warnings.
Estimators or functions that don't have a handle manually specified will use a cached thread-local handle, unless n_streams/device_ids are specified.
Accessing cuml.Handle (a re-export of pylibraft.common.handle.Handle) now also raises a deprecation warning. This re-export will be removed in the following release.
Docstrings are updated to note the deprecation and inform the user of alternate parameters like n_streams/device_ids.
Tests are updated to no longer specify a handle (we had a few modules that parametrized across handle/no-handle).
New tests are added to test the deprecation warnings and that execution with a handle specified still works in all cases.

Fixes #6869
Fixes #7059
Fixes #7465

jcrist

Annotated the diff a bit. Most of the changes here are mechanical application of get_handle (or docstring updates) and aren't that interesting to review; I believe I've flagged anything worth looking at.

jcrist · 2025-12-18T21:16:17Z

    return ms


-@pytest.mark.xfail(reason="Need rapidsai/rmm#415 to detect memleak robustly")


These tests made use of handle, along with many long-since broken APIs. I opted to rip them out rather than just tweak the handle bits since it didn't seem worth it to keep around tests we know are broken and aren't running. Can always bring them back later, they're still in the git history.

jcrist · 2025-12-18T21:16:43Z

        assert r2 >= (sk_r2 - 0.08)


-@pytest.mark.xfail(reason="Need rapidsai/rmm#415 to detect memleak robustly")


These tests made use of handle, along with many long-since broken APIs. I opted to rip them out rather than just tweak the handle bits since it didn't seem worth it to keep around tests we know are broken and aren't running. Can always bring them back later, they're still in the git history.

jcrist · 2025-12-18T21:17:46Z

 from cuml.internals.mixins import TagsMixin
 from cuml.internals.outputs import check_output_type

+_THREAD_STATE = threading.local()


This file implements the deprecation warnings, and the handle cache/creation behavior. It's worth reviewing.

jcrist · 2025-12-18T21:18:19Z

 # SPDX-License-Identifier: Apache-2.0
 #
 import inspect
+import warnings


This file contains most of the new behavior and deprecation tests. It's also worth reviewing.

jcrist · 2025-12-18T21:19:41Z

          memory usage. This is independent from knn_overlap_factor as long as
          'knn_overlap_factor' < 'knn_n_clusters'.

+    device_ids : list[int], "all", or None, default=None


HDBSCAN now has a new device_ids parameter for configuring SNMG execution.

jcrist · 2025-12-18T21:21:22Z

        cdef double intercept_f64
-        cdef handle_t* handle_ = <handle_t*><size_t>self.handle.getHandle()
+        # Always use 2 streams to expose concurrency in the eig computation
+        handle = get_handle(model=self, n_streams=2)


Hardcoded n_streams=2 here. LinearRegression is a bit of a weird case - it only makes use of 2 concurrent streams (and it always makes sense to use them, which we were by default, since they don't run in separate threads). Hardcoding made more sense to me than adding an n_streams parameter.

jcrist · 2025-12-18T21:22:03Z

        - Start with `nnd_n_clusters = 4` and increase (4 → 8 → 16...) for less GPU
          memory usage. This is independent from nnd_overlap_factor as long as
          'nnd_overlap_factor' < 'nnd_n_clusters'.
+    device_ids : list[int], "all", or None, default=None


UMAP has a new device_ids parameter for configuring SNMG execution.

jcrist · 2025-12-18T21:23:34Z

        Number of vectors approximating the hessian for the underlying QN
        solver (l-bfgs).
+    n_streams : int (default = 1)
+        Number of parallel streams used for fitting.


LinearSVC has a new n_streams parameter for making use of multiple streams during fitting. Here I had it default to 1 (matching the previous default). For (historical?) reasons the n_streams parameter in RandomForestClassifier/RandomForestRegressor` defaults to 4. I don't think we can have a sane hardware-agnostic default, so keeping it as 1 here makes sense to me.

do you have a sense of how much of a performance difference it makes for LinearSVMs? I have benchmarked RF for this, and 4 is not a bad default for perf, but don't have a sense for the linear SVMs

I didn't benchmark. Since n_streams in this case (and in RandomForest* case) also consumes a thread, I'm a bit hesitant to set a higher default (increasing this would also be a behavior change). I suggest we leave the default as-is in this PR and can consider changing it in a followup.

jcrist · 2025-12-18T21:24:51Z

    # requirement of minimality for core points


-def get_handle(use_handle, n_streams=0):


Some of our tests were parametrized to run both with and without a handle specified. These tests passed with the changes here (if I configured the FutureWarning for deprecation not to fail). I opted to rip them out instead though since they didn't provide a meaningful benefit and would go away once the deprecation is completed.

jcrist · 2025-12-18T21:36:55Z

(Rebased on top of #7629 to get CI passing, this PR itself doesn't packaging changes)

jcrist · 2025-12-19T03:45:40Z

I believe all tests should pass now, but I've realized that with the current architecture we probably need to keep handle around for the *MG classes since they also manage the comms (that, or alternatively support some way of setting the handle returned by get_handle contextually). It'd be nice if Handle didn't own all these disparate things (streams, stream pools, library handles, and comms?), but this is one where we'll need to keep around an option for configuration.

~~Will look into this tomorrow, just wanted to get CI sorted over night.~~ (edit - nvm, after showing it was clear that this was a 1-line fix and that at least for now keeping handle around in the internal *MG classes is fine and the simplest path forward. Pushed the fix up tonight.)

jcrist · 2025-12-19T16:08:29Z

Ok, tests are passing (sans a build hiccup). This is ready for review.

dantegd

Went through the whole PR and overall the changes are really comprehensive and well implemented, didn't find many specific comments or requests to make. This is a clean deprecation that makes the API simpler without sacrificing power. The test coverage is solid.

Also drops some xfailed (and long since broken) memleak tests. Better to delete than edit in a way that still doesn't fix them.

Additional changes: - Adds `n_streams` parameter to `LinearSVC` since this estimator supports stream parallel operations. Defaulting to 1 to match prior behavior. - Deletes some long broken and bitrotted memleak tests

These were using the wrong directive, leading to a parsing error.

jcrist · 2025-12-22T18:50:33Z

/merge

This deprecates the `handle` argument to all classes, methods, and functions. `cuml.Handle` is a bit of a relic, and doesn't necessarily provide a solid user-facing benefit. A few things it was purported to do: **Allow for asynchronous execution** All cuml python APIs are synchronous, we do not support asynchronous execution. Any docs or claims that this is currently possible are incorrect. Further, the python apis we're mimicking and the ecosystem we're extending don't support asynchronous execution. `cupy` (which we make heavy use of in cuml) does have some mechanisms for asynchronous execution - if/when we want to support asynchronous operation, we're likely to take an approach that piggybacks off their configuration (for better ecosystem support) than rely on the existing `cuml.Handle` class that doesn't really match the task. Handles are for specifying and caching resources, not for configuring sync/async operation. **Specify the stream of execution** This _was_ possible with `cuml.Handle`, but didn't provide a meaningful benefit. `cuml` apis are all synchronous, so the stream they execute on doesn't matter. Further, not every function or estimator that accepted a `cuml.Handle` made use of the handle (or respected the stream), making this specification kind of moot. `cuml.Handle` is really only used for functions part of `libcuml` - anything written using cupy/cudf/etc... ignores them completely. Given we provide synchronous APIs only (and rely on threads for concurrency, matching python conventions), it doesn't make sense to expose this at the user level necessarily. Better to make it an implementation detail of APIs that rely on `libcuml`. **Specify the number of streams in a backing stream pool** A few of our algorithms support using multiple streams from a pool on the handle. In some cases (`cuml.ensemble`) we also exposed a top-level `n_streams` argument, which seems preferable. I've added this to `LinearSVC` (the only other algorithm that uses this AFAICT). Elevating the `n_streams` to a top-level parameter makes them more discoverable by users, and also lets us avoid modifying or constructing a handle within the `__init__`, better following sklearn conventions. **Specify a `DeviceResourcesSNMG`** This is a relatively new use case currently only supported by `HDBSCAN` and `UMAP`. Some of our algorithms support running on multiple GPUs on a single node (when configured). Previously this was supported by passing in a `pylibraft.common.handle.DeviceResourcesSNMG` instead of a `pylibraft.common.handle.Handle`. There were several problems with that though (see rapidsai#7465, rapidsai#7059). Instead, we now elevate `device_ids` to a top-level parameter for these models. Like with `n_streams`, this better elevates this feature, and keeps our configuration and `__init__` simpler. `DeviceResourcesSNMG` is now an implementation detail, not something user-facing. **Hold other resources to share across calls** A `Handle` contains many resources (cusolver handles, cublas handles, ...), which are created lazily and may have some costs to initialize. The previous model would result in a unique `Handle` per estimator or function call by default, preventing sharing these initialization costs. Expert users may create a handle once and pass it around, but that requires extra plumbing on their part with no real other benefit. Instead, we now cache a handle per thread (some resources have concurrency or thread limitations, with python's thread-based concurrency model keeping things thread local prevents any issues on that front). For single threaded programs only a single `Handle` will now be created, letting us avoid any repeated init costs. --- Given the above arguments, `Handle` and `DeviceResourcesSNMG` are now implementation details, and are no longer user facing. To accomplish this, this PR: - Modifies all relevant estimators and functions to warn if a user specifies a `handle`. During the deprecation cycle the specified `handle` will continue to be used. The warning informs the user the parameter is deprecated, and if the estimator supports `n_streams`/`device_ids` will also include a note to use that instead. - The internal `*MG` estimators (e.g. `LogisticRegressionMG`) still do include a `handle` parameter, since for now the distributed comms are also attached to that object and need to be provided somehow by the caller. Since these are internal(ish) classes, I'm ok with keeping the `handle` parameter around on them for now. In the long run we probably will want to rethink our multi-gpu APIs since the `*MG` classes are a bit unwieldly, but no sense changing them for now. - All estimators that make use of a handle's stream pool now include an `n_streams` parameter for configuring the pool size. Users are recommended to use that by the docs and deprecation warnings. - All estimators that support multi-gpu execution now include a `device_ids` parameter for configuring the devices used. Users are recommended to use that by the docs and deprecation warnings. - Estimators or functions that don't have a handle manually specified will use a cached thread-local handle, unless `n_streams`/`device_ids` are specified. - Accessing `cuml.Handle` (a re-export of `pylibraft.common.handle.Handle`) now also raises a deprecation warning. This re-export will be removed in the following release. - Docstrings are updated to note the deprecation and inform the user of alternate parameters like `n_streams`/`device_ids`. - Tests are updated to no longer specify a `handle` (we had a few modules that parametrized across handle/no-handle). - New tests are added to test the deprecation warnings and that execution with a handle specified still works in all cases. Fixes rapidsai#6869 Fixes rapidsai#7059 Fixes rapidsai#7465 Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#7628

This removes the deprecated `handle` argument/attributes on all relevant models and functions. Note that for now the (_mostly_ private) `*MG` classes retain their `handle` argument and attribute since the multi-gpu comms are currently attached to the handles. Some care has been taken to ensure the proper handle is utilized for multi-gpu APIs. All other APIs now make use of the `get_handle` function exclusively. This is a follow-up to #7628. Fixes #7722. Authors: - Jim Crist-Harif (https://github.com/jcrist) - Simon Adorf (https://github.com/csadorf) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: #7751

This removes the deprecated `handle` argument/attributes on all relevant models and functions. Note that for now the (_mostly_ private) `*MG` classes retain their `handle` argument and attribute since the multi-gpu comms are currently attached to the handles. Some care has been taken to ensure the proper handle is utilized for multi-gpu APIs. All other APIs now make use of the `get_handle` function exclusively. This is a follow-up to rapidsai#7628. Fixes rapidsai#7722. Authors: - Jim Crist-Harif (https://github.com/jcrist) - Simon Adorf (https://github.com/csadorf) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#7751

jcrist self-assigned this Dec 18, 2025

jcrist added the Cython / Python Cython or Python issue label Dec 18, 2025

jcrist requested a review from a team as a code owner December 18, 2025 21:14

jcrist added the improvement Improvement / enhancement to an existing function label Dec 18, 2025

jcrist requested a review from betatim December 18, 2025 21:14

jcrist added the breaking Breaking change label Dec 18, 2025

jcrist commented Dec 18, 2025

View reviewed changes

jcrist force-pushed the deprecate-handle branch from 46ef108 to 73b9e30 Compare December 18, 2025 21:36

jcrist requested a review from a team as a code owner December 18, 2025 21:36

jcrist requested a review from KyleFromNVIDIA December 18, 2025 21:36

jcrist changed the title ~~Deprecate handle~~ Deprecate handle from public APIs Dec 19, 2025

jcrist force-pushed the deprecate-handle branch from 3248458 to 60b2e01 Compare December 19, 2025 16:06

dantegd reviewed Dec 19, 2025

View reviewed changes

Comment thread python/cuml/cuml/internals/base.py

Comment thread python/cuml/cuml/internals/base.py

Comment thread python/cuml/cuml/ensemble/randomforest_common.pyx

Comment thread python/cuml/cuml/internals/base.py

jcrist added 14 commits December 19, 2025 16:34

Initial deprecation infra

e11bd33

Use get_handle in cuml.cluster

a948532

Use get_handle in cuml.datasets

669042d

Use get_handle in cuml.decomposition

493d905

Use get_handle in cuml.ensemble/cuml.fil

156b0e6

Also drops some xfailed (and long since broken) memleak tests. Better to delete than edit in a way that still doesn't fix them.

Use get_handle in cuml.experimental

534a328

Use get_handle in cuml.linear_model/cuml.solvers

da9b6c2

Use get_handle in cuml.manifold

cd65a23

Use get_handle in cuml.metrics

cb76f9f

Use get_handle in cuml.neighbors

87489e1

Use get_handle in cuml.svm

e057d61

Additional changes: - Adds `n_streams` parameter to `LinearSVC` since this estimator supports stream parallel operations. Defaulting to 1 to match prior behavior. - Deletes some long broken and bitrotted memleak tests

Use get_handle in cuml.tsa

ef155ca

Use get_handle in cuml.explainer

3bca25e

Add tests for deprecation warnings

8878fa5

jcrist added 7 commits December 19, 2025 16:34

Update docstrings

3543632

Remove unused cuml.testing.utils.get_handle

e956318

Deprecate cuml.Handle

7c6ce63

Fixup test_class_enumerator

8a6684d

Fixup metrics docs

26a3177

These were using the wrong directive, leading to a parsing error.

Don't deprecate handle on *MG classes

b28eae8

Respond to feedback

f3d10c1

jcrist force-pushed the deprecate-handle branch from 60b2e01 to f3d10c1 Compare December 19, 2025 22:46

jcrist removed request for a team and KyleFromNVIDIA December 19, 2025 22:52

dantegd approved these changes Dec 22, 2025

View reviewed changes

Merge branch 'main' into deprecate-handle

f7805e0

rapids-bot Bot merged commit 5878fd0 into rapidsai:main Dec 22, 2025
102 checks passed

jcrist deleted the deprecate-handle branch December 22, 2025 18:50

csadorf mentioned this pull request Jan 26, 2026

Remove handle from public API #7722

Closed

jcrist mentioned this pull request Feb 3, 2026

Remove deprecated handle argument #7751

Merged

		return ms


		@pytest.mark.xfail(reason="Need rapidsai/rmm#415 to detect memleak robustly")

		assert r2 >= (sk_r2 - 0.08)


		@pytest.mark.xfail(reason="Need rapidsai/rmm#415 to detect memleak robustly")

		# requirement of minimality for core points


		def get_handle(use_handle, n_streams=0):

Conversation

jcrist commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcrist left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcrist commented Dec 18, 2025

Uh oh!

jcrist commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcrist commented Dec 19, 2025

Uh oh!

dantegd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jcrist commented Dec 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jcrist commented Dec 18, 2025 •

edited

Loading

jcrist commented Dec 19, 2025 •

edited

Loading