Skip to content

Deprecate handle from public APIs#7628

Merged
rapids-bot[bot] merged 22 commits intorapidsai:mainfrom
jcrist:deprecate-handle
Dec 22, 2025
Merged

Deprecate handle from public APIs#7628
rapids-bot[bot] merged 22 commits intorapidsai:mainfrom
jcrist:deprecate-handle

Conversation

@jcrist
Copy link
Copy Markdown
Member

@jcrist jcrist commented Dec 18, 2025

This deprecates the handle argument to all classes, methods, and functions.

cuml.Handle is a bit of a relic, and doesn't necessarily provide a solid user-facing benefit. A few things it was purported to do:

Allow for asynchronous execution

All cuml python APIs are synchronous, we do not support asynchronous execution. Any docs or claims that this is currently possible are incorrect. Further, the python apis we're mimicking and the ecosystem we're extending don't support asynchronous execution. cupy (which we make heavy use of in cuml) does have some mechanisms for asynchronous execution - if/when we want to support asynchronous operation, we're likely to take an approach that piggybacks off their configuration (for better ecosystem support) than rely on the existing cuml.Handle class that doesn't really match the task. Handles are for specifying and caching resources, not for configuring sync/async operation.

Specify the stream of execution

This was possible with cuml.Handle, but didn't provide a meaningful benefit. cuml apis are all synchronous, so the stream they execute on doesn't matter. Further, not every function or estimator that accepted a cuml.Handle made use of the handle (or respected the stream), making this specification kind of moot. cuml.Handle is really only used for functions part of libcuml - anything written using cupy/cudf/etc... ignores them completely.

Given we provide synchronous APIs only (and rely on threads for concurrency, matching python conventions), it doesn't make sense to expose this at the user level necessarily. Better to make it an implementation detail of APIs that rely on libcuml.

Specify the number of streams in a backing stream pool

A few of our algorithms support using multiple streams from a pool on the handle. In some cases (cuml.ensemble) we also exposed a top-level n_streams argument, which seems preferable. I've added this to LinearSVC (the only other algorithm that uses this AFAICT). Elevating the n_streams to a top-level parameter makes them more discoverable by users, and also lets us avoid modifying or constructing a handle within the __init__, better following sklearn conventions.

Specify a DeviceResourcesSNMG

This is a relatively new use case currently only supported by HDBSCAN and UMAP. Some of our algorithms support running on multiple GPUs on a single node (when configured). Previously this was supported by passing in a pylibraft.common.handle.DeviceResourcesSNMG instead of a pylibraft.common.handle.Handle. There were several problems with that though (see #7465, #7059). Instead, we now elevate device_ids to a top-level parameter for these models. Like with n_streams, this better elevates this feature, and keeps our configuration and __init__ simpler. DeviceResourcesSNMG is now an implementation detail, not something user-facing.

Hold other resources to share across calls

A Handle contains many resources (cusolver handles, cublas handles, ...), which are created lazily and may have some costs to initialize. The previous model would result in a unique Handle per estimator or function call by default, preventing sharing these initialization costs. Expert users may create a handle once and pass it around, but that requires extra plumbing on their part with no real other benefit. Instead, we now cache a handle per thread (some resources have concurrency or thread limitations, with python's thread-based concurrency model keeping things thread local prevents any issues on that front). For single threaded programs only a single Handle will now be created, letting us avoid any repeated init costs.


Given the above arguments, Handle and DeviceResourcesSNMG are now implementation details, and are no longer user facing. To accomplish this, this PR:

  • Modifies all relevant estimators and functions to warn if a user specifies a handle. During the deprecation cycle the specified handle will continue to be used. The warning informs the user the parameter is deprecated, and if the estimator supports n_streams/device_ids will also include a note to use that instead.
  • The internal *MG estimators (e.g. LogisticRegressionMG) still do include a handle parameter, since for now the distributed comms are also attached to that object and need to be provided somehow by the caller. Since these are internal(ish) classes, I'm ok with keeping the handle parameter around on them for now. In the long run we probably will want to rethink our multi-gpu APIs since the *MG classes are a bit unwieldly, but no sense changing them for now.
  • All estimators that make use of a handle's stream pool now include an n_streams parameter for configuring the pool size. Users are recommended to use that by the docs and deprecation warnings.
  • All estimators that support multi-gpu execution now include a device_ids parameter for configuring the devices used. Users are recommended to use that by the docs and deprecation warnings.
  • Estimators or functions that don't have a handle manually specified will use a cached thread-local handle, unless n_streams/device_ids are specified.
  • Accessing cuml.Handle (a re-export of pylibraft.common.handle.Handle) now also raises a deprecation warning. This re-export will be removed in the following release.
  • Docstrings are updated to note the deprecation and inform the user of alternate parameters like n_streams/device_ids.
  • Tests are updated to no longer specify a handle (we had a few modules that parametrized across handle/no-handle).
  • New tests are added to test the deprecation warnings and that execution with a handle specified still works in all cases.

Fixes #6869
Fixes #7059
Fixes #7465

@jcrist jcrist self-assigned this Dec 18, 2025
@jcrist jcrist added the Cython / Python Cython or Python issue label Dec 18, 2025
@jcrist jcrist requested a review from a team as a code owner December 18, 2025 21:14
@jcrist jcrist added the improvement Improvement / enhancement to an existing function label Dec 18, 2025
@jcrist jcrist requested a review from betatim December 18, 2025 21:14
@jcrist jcrist added the breaking Breaking change label Dec 18, 2025
Copy link
Copy Markdown
Member Author

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotated the diff a bit. Most of the changes here are mechanical application of get_handle (or docstring updates) and aren't that interesting to review; I believe I've flagged anything worth looking at.

return ms


@pytest.mark.xfail(reason="Need rapidsai/rmm#415 to detect memleak robustly")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests made use of handle, along with many long-since broken APIs. I opted to rip them out rather than just tweak the handle bits since it didn't seem worth it to keep around tests we know are broken and aren't running. Can always bring them back later, they're still in the git history.

assert r2 >= (sk_r2 - 0.08)


@pytest.mark.xfail(reason="Need rapidsai/rmm#415 to detect memleak robustly")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests made use of handle, along with many long-since broken APIs. I opted to rip them out rather than just tweak the handle bits since it didn't seem worth it to keep around tests we know are broken and aren't running. Can always bring them back later, they're still in the git history.

from cuml.internals.mixins import TagsMixin
from cuml.internals.outputs import check_output_type

_THREAD_STATE = threading.local()
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file implements the deprecation warnings, and the handle cache/creation behavior. It's worth reviewing.

# SPDX-License-Identifier: Apache-2.0
#
import inspect
import warnings
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file contains most of the new behavior and deprecation tests. It's also worth reviewing.

memory usage. This is independent from knn_overlap_factor as long as
'knn_overlap_factor' < 'knn_n_clusters'.

device_ids : list[int], "all", or None, default=None
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HDBSCAN now has a new device_ids parameter for configuring SNMG execution.

cdef double intercept_f64
cdef handle_t* handle_ = <handle_t*><size_t>self.handle.getHandle()
# Always use 2 streams to expose concurrency in the eig computation
handle = get_handle(model=self, n_streams=2)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded n_streams=2 here. LinearRegression is a bit of a weird case - it only makes use of 2 concurrent streams (and it always makes sense to use them, which we were by default, since they don't run in separate threads). Hardcoding made more sense to me than adding an n_streams parameter.

- Start with `nnd_n_clusters = 4` and increase (4 → 8 → 16...) for less GPU
memory usage. This is independent from nnd_overlap_factor as long as
'nnd_overlap_factor' < 'nnd_n_clusters'.
device_ids : list[int], "all", or None, default=None
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UMAP has a new device_ids parameter for configuring SNMG execution.

Number of vectors approximating the hessian for the underlying QN
solver (l-bfgs).
n_streams : int (default = 1)
Number of parallel streams used for fitting.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LinearSVC has a new n_streams parameter for making use of multiple streams during fitting. Here I had it default to 1 (matching the previous default). For (historical?) reasons the n_streams parameter in RandomForestClassifier/RandomForestRegressor` defaults to 4. I don't think we can have a sane hardware-agnostic default, so keeping it as 1 here makes sense to me.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a sense of how much of a performance difference it makes for LinearSVMs? I have benchmarked RF for this, and 4 is not a bad default for perf, but don't have a sense for the linear SVMs

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't benchmark. Since n_streams in this case (and in RandomForest* case) also consumes a thread, I'm a bit hesitant to set a higher default (increasing this would also be a behavior change). I suggest we leave the default as-is in this PR and can consider changing it in a followup.

# requirement of minimality for core points


def get_handle(use_handle, n_streams=0):
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of our tests were parametrized to run both with and without a handle specified. These tests passed with the changes here (if I configured the FutureWarning for deprecation not to fail). I opted to rip them out instead though since they didn't provide a meaningful benefit and would go away once the deprecation is completed.

@jcrist jcrist requested a review from a team as a code owner December 18, 2025 21:36
@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Dec 18, 2025

(Rebased on top of #7629 to get CI passing, this PR itself doesn't packaging changes)

@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Dec 19, 2025

I believe all tests should pass now, but I've realized that with the current architecture we probably need to keep handle around for the *MG classes since they also manage the comms (that, or alternatively support some way of setting the handle returned by get_handle contextually). It'd be nice if Handle didn't own all these disparate things (streams, stream pools, library handles, and comms?), but this is one where we'll need to keep around an option for configuration.

Will look into this tomorrow, just wanted to get CI sorted over night. (edit - nvm, after showing it was clear that this was a 1-line fix and that at least for now keeping handle around in the internal *MG classes is fine and the simplest path forward. Pushed the fix up tonight.)

@jcrist jcrist changed the title Deprecate handle Deprecate handle from public APIs Dec 19, 2025
@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Dec 19, 2025

Ok, tests are passing (sans a build hiccup). This is ready for review.

Copy link
Copy Markdown
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the whole PR and overall the changes are really comprehensive and well implemented, didn't find many specific comments or requests to make. This is a clean deprecation that makes the API simpler without sacrificing power. The test coverage is solid.

Comment thread python/cuml/cuml/internals/base.py
Comment thread python/cuml/cuml/internals/base.py
Comment thread python/cuml/cuml/ensemble/randomforest_common.pyx
Comment thread python/cuml/cuml/internals/base.py
@jcrist jcrist removed request for a team and KyleFromNVIDIA December 19, 2025 22:52
@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Dec 22, 2025

/merge

@rapids-bot rapids-bot Bot merged commit 5878fd0 into rapidsai:main Dec 22, 2025
102 checks passed
@jcrist jcrist deleted the deprecate-handle branch December 22, 2025 18:50
mani-builds pushed a commit to mani-builds/cuml that referenced this pull request Jan 11, 2026
This deprecates the `handle` argument to all classes, methods, and functions.

`cuml.Handle` is a bit of a relic, and doesn't necessarily provide a solid user-facing benefit. A few things it was purported to do:

**Allow for asynchronous execution**

All cuml python APIs are synchronous, we do not support asynchronous execution. Any docs or claims that this is currently possible are incorrect. Further, the python apis we're mimicking and the ecosystem we're extending don't support asynchronous execution. `cupy` (which we make heavy use of in cuml) does have some mechanisms for asynchronous execution - if/when we want to support asynchronous operation, we're likely to take an approach that piggybacks off their configuration (for better ecosystem support) than rely on the existing `cuml.Handle` class that doesn't really match the task. Handles are for specifying and caching resources, not for configuring sync/async operation.

**Specify the stream of execution**

This _was_ possible with `cuml.Handle`, but didn't provide a meaningful benefit. `cuml` apis are all synchronous, so the stream they execute on doesn't matter. Further, not every function or estimator that accepted a `cuml.Handle` made use of the handle (or respected the stream), making this specification kind of moot. `cuml.Handle` is really only used for functions part of `libcuml` - anything written using cupy/cudf/etc... ignores them completely.

Given we provide synchronous APIs only (and rely on threads for concurrency, matching python conventions), it doesn't make sense to expose this at the user level necessarily. Better to make it an implementation detail of APIs that rely on `libcuml`.

**Specify the number of streams in a backing stream pool**

A few of our algorithms support using multiple streams from a pool on the handle. In some cases (`cuml.ensemble`) we also exposed a top-level `n_streams` argument, which seems preferable. I've added this to `LinearSVC` (the only other algorithm that uses this AFAICT). Elevating the `n_streams` to a top-level parameter makes them more discoverable by users, and also lets us avoid modifying or constructing a handle within the `__init__`, better following sklearn conventions.

**Specify a `DeviceResourcesSNMG`**

This is a relatively new use case currently only supported by `HDBSCAN` and `UMAP`. Some of our algorithms support running on multiple GPUs on a single node (when configured). Previously this was supported by passing in a `pylibraft.common.handle.DeviceResourcesSNMG` instead of a `pylibraft.common.handle.Handle`. There were several problems with that though (see rapidsai#7465, rapidsai#7059). Instead, we now elevate `device_ids` to a top-level parameter for these models. Like with `n_streams`, this better elevates this feature, and keeps our configuration and `__init__` simpler. `DeviceResourcesSNMG` is now an implementation detail, not something user-facing.

**Hold other resources to share across calls**

A `Handle` contains many resources (cusolver handles, cublas handles, ...), which are created lazily and may have some costs to initialize. The previous model would result in a unique `Handle` per estimator or function call by default, preventing sharing these initialization costs. Expert users may create a handle once and pass it around, but that requires extra plumbing on their part with no real other benefit. Instead, we now cache a handle per thread (some resources have concurrency or thread limitations, with python's thread-based concurrency model keeping things thread local prevents any issues on that front). For single threaded programs only a single `Handle` will now be created, letting us avoid any repeated init costs.


---

Given the above arguments, `Handle` and `DeviceResourcesSNMG` are now implementation details, and are no longer user facing. To accomplish this, this PR:

- Modifies all relevant estimators and functions to warn if a user specifies a `handle`. During the deprecation cycle the specified `handle` will continue to be used. The warning informs the user the parameter is deprecated, and if the estimator supports `n_streams`/`device_ids` will also include a note to use that instead.
- The internal `*MG` estimators (e.g. `LogisticRegressionMG`) still do include a `handle` parameter, since for now the distributed comms are also attached to that object and need to be provided somehow by the caller. Since these are internal(ish) classes, I'm ok with keeping the `handle` parameter around on them for now. In the long run we probably will want to rethink our multi-gpu APIs since the `*MG` classes are a bit unwieldly, but no sense changing them for now.
- All estimators that make use of a handle's stream pool now include an `n_streams` parameter for configuring the pool size. Users are recommended to use that by the docs and deprecation warnings.
- All estimators that support multi-gpu execution now include a `device_ids` parameter for configuring the devices used. Users are recommended to use that by the docs and deprecation warnings.
- Estimators or functions that don't have a handle manually specified will use a cached thread-local handle, unless `n_streams`/`device_ids` are specified.
- Accessing `cuml.Handle` (a re-export of `pylibraft.common.handle.Handle`) now also raises a deprecation warning. This re-export will be removed in the following release.
- Docstrings are updated to note the deprecation and inform the user of alternate parameters like `n_streams`/`device_ids`.
- Tests are updated to no longer specify a `handle` (we had a few modules that parametrized across handle/no-handle).
- New tests are added to test the deprecation warnings and that execution with a handle specified still works in all cases.

Fixes rapidsai#6869
Fixes rapidsai#7059
Fixes rapidsai#7465

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#7628
rapids-bot Bot pushed a commit that referenced this pull request Feb 5, 2026
This removes the deprecated `handle` argument/attributes on all relevant models and functions.

Note that for now the (_mostly_ private) `*MG` classes retain their `handle` argument and attribute since the multi-gpu comms are currently attached to the handles. Some care has been taken to ensure the proper handle is utilized for multi-gpu APIs. All other APIs now make use of the `get_handle` function exclusively.

This is a follow-up to #7628. Fixes #7722.

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)
  - Simon Adorf (https://github.com/csadorf)

Approvers:
  - Victor Lafargue (https://github.com/viclafargue)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #7751
dantegd added a commit to dantegd/cuml that referenced this pull request Feb 17, 2026
This removes the deprecated `handle` argument/attributes on all relevant models and functions.

Note that for now the (_mostly_ private) `*MG` classes retain their `handle` argument and attribute since the multi-gpu comms are currently attached to the handles. Some care has been taken to ensure the proper handle is utilized for multi-gpu APIs. All other APIs now make use of the `get_handle` function exclusively.

This is a follow-up to rapidsai#7628. Fixes rapidsai#7722.

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)
  - Simon Adorf (https://github.com/csadorf)

Approvers:
  - Victor Lafargue (https://github.com/viclafargue)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#7751
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Provide cuML wrapper of RAFT DeviceResourcesSNMG Investigate UMAP Handle Type Abstraction Issue cuML Handle API Documentation Review

3 participants