Deprecate handle from public APIs#7628
Conversation
jcrist
left a comment
There was a problem hiding this comment.
Annotated the diff a bit. Most of the changes here are mechanical application of get_handle (or docstring updates) and aren't that interesting to review; I believe I've flagged anything worth looking at.
| return ms | ||
|
|
||
|
|
||
| @pytest.mark.xfail(reason="Need rapidsai/rmm#415 to detect memleak robustly") |
There was a problem hiding this comment.
These tests made use of handle, along with many long-since broken APIs. I opted to rip them out rather than just tweak the handle bits since it didn't seem worth it to keep around tests we know are broken and aren't running. Can always bring them back later, they're still in the git history.
| assert r2 >= (sk_r2 - 0.08) | ||
|
|
||
|
|
||
| @pytest.mark.xfail(reason="Need rapidsai/rmm#415 to detect memleak robustly") |
There was a problem hiding this comment.
These tests made use of handle, along with many long-since broken APIs. I opted to rip them out rather than just tweak the handle bits since it didn't seem worth it to keep around tests we know are broken and aren't running. Can always bring them back later, they're still in the git history.
| from cuml.internals.mixins import TagsMixin | ||
| from cuml.internals.outputs import check_output_type | ||
|
|
||
| _THREAD_STATE = threading.local() |
There was a problem hiding this comment.
This file implements the deprecation warnings, and the handle cache/creation behavior. It's worth reviewing.
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| import inspect | ||
| import warnings |
There was a problem hiding this comment.
This file contains most of the new behavior and deprecation tests. It's also worth reviewing.
| memory usage. This is independent from knn_overlap_factor as long as | ||
| 'knn_overlap_factor' < 'knn_n_clusters'. | ||
|
|
||
| device_ids : list[int], "all", or None, default=None |
There was a problem hiding this comment.
HDBSCAN now has a new device_ids parameter for configuring SNMG execution.
| cdef double intercept_f64 | ||
| cdef handle_t* handle_ = <handle_t*><size_t>self.handle.getHandle() | ||
| # Always use 2 streams to expose concurrency in the eig computation | ||
| handle = get_handle(model=self, n_streams=2) |
There was a problem hiding this comment.
Hardcoded n_streams=2 here. LinearRegression is a bit of a weird case - it only makes use of 2 concurrent streams (and it always makes sense to use them, which we were by default, since they don't run in separate threads). Hardcoding made more sense to me than adding an n_streams parameter.
| - Start with `nnd_n_clusters = 4` and increase (4 → 8 → 16...) for less GPU | ||
| memory usage. This is independent from nnd_overlap_factor as long as | ||
| 'nnd_overlap_factor' < 'nnd_n_clusters'. | ||
| device_ids : list[int], "all", or None, default=None |
There was a problem hiding this comment.
UMAP has a new device_ids parameter for configuring SNMG execution.
| Number of vectors approximating the hessian for the underlying QN | ||
| solver (l-bfgs). | ||
| n_streams : int (default = 1) | ||
| Number of parallel streams used for fitting. |
There was a problem hiding this comment.
LinearSVC has a new n_streams parameter for making use of multiple streams during fitting. Here I had it default to 1 (matching the previous default). For (historical?) reasons the n_streams parameter in RandomForestClassifier/RandomForestRegressor` defaults to 4. I don't think we can have a sane hardware-agnostic default, so keeping it as 1 here makes sense to me.
There was a problem hiding this comment.
do you have a sense of how much of a performance difference it makes for LinearSVMs? I have benchmarked RF for this, and 4 is not a bad default for perf, but don't have a sense for the linear SVMs
There was a problem hiding this comment.
I didn't benchmark. Since n_streams in this case (and in RandomForest* case) also consumes a thread, I'm a bit hesitant to set a higher default (increasing this would also be a behavior change). I suggest we leave the default as-is in this PR and can consider changing it in a followup.
| # requirement of minimality for core points | ||
|
|
||
|
|
||
| def get_handle(use_handle, n_streams=0): |
There was a problem hiding this comment.
Some of our tests were parametrized to run both with and without a handle specified. These tests passed with the changes here (if I configured the FutureWarning for deprecation not to fail). I opted to rip them out instead though since they didn't provide a meaningful benefit and would go away once the deprecation is completed.
46ef108 to
73b9e30
Compare
|
(Rebased on top of #7629 to get CI passing, this PR itself doesn't packaging changes) |
|
I believe all tests should pass now, but I've realized that with the current architecture we probably need to keep
|
3248458 to
60b2e01
Compare
|
Ok, tests are passing (sans a build hiccup). This is ready for review. |
dantegd
left a comment
There was a problem hiding this comment.
Went through the whole PR and overall the changes are really comprehensive and well implemented, didn't find many specific comments or requests to make. This is a clean deprecation that makes the API simpler without sacrificing power. The test coverage is solid.
Also drops some xfailed (and long since broken) memleak tests. Better to delete than edit in a way that still doesn't fix them.
Additional changes: - Adds `n_streams` parameter to `LinearSVC` since this estimator supports stream parallel operations. Defaulting to 1 to match prior behavior. - Deletes some long broken and bitrotted memleak tests
These were using the wrong directive, leading to a parsing error.
60b2e01 to
f3d10c1
Compare
|
/merge |
This deprecates the `handle` argument to all classes, methods, and functions. `cuml.Handle` is a bit of a relic, and doesn't necessarily provide a solid user-facing benefit. A few things it was purported to do: **Allow for asynchronous execution** All cuml python APIs are synchronous, we do not support asynchronous execution. Any docs or claims that this is currently possible are incorrect. Further, the python apis we're mimicking and the ecosystem we're extending don't support asynchronous execution. `cupy` (which we make heavy use of in cuml) does have some mechanisms for asynchronous execution - if/when we want to support asynchronous operation, we're likely to take an approach that piggybacks off their configuration (for better ecosystem support) than rely on the existing `cuml.Handle` class that doesn't really match the task. Handles are for specifying and caching resources, not for configuring sync/async operation. **Specify the stream of execution** This _was_ possible with `cuml.Handle`, but didn't provide a meaningful benefit. `cuml` apis are all synchronous, so the stream they execute on doesn't matter. Further, not every function or estimator that accepted a `cuml.Handle` made use of the handle (or respected the stream), making this specification kind of moot. `cuml.Handle` is really only used for functions part of `libcuml` - anything written using cupy/cudf/etc... ignores them completely. Given we provide synchronous APIs only (and rely on threads for concurrency, matching python conventions), it doesn't make sense to expose this at the user level necessarily. Better to make it an implementation detail of APIs that rely on `libcuml`. **Specify the number of streams in a backing stream pool** A few of our algorithms support using multiple streams from a pool on the handle. In some cases (`cuml.ensemble`) we also exposed a top-level `n_streams` argument, which seems preferable. I've added this to `LinearSVC` (the only other algorithm that uses this AFAICT). Elevating the `n_streams` to a top-level parameter makes them more discoverable by users, and also lets us avoid modifying or constructing a handle within the `__init__`, better following sklearn conventions. **Specify a `DeviceResourcesSNMG`** This is a relatively new use case currently only supported by `HDBSCAN` and `UMAP`. Some of our algorithms support running on multiple GPUs on a single node (when configured). Previously this was supported by passing in a `pylibraft.common.handle.DeviceResourcesSNMG` instead of a `pylibraft.common.handle.Handle`. There were several problems with that though (see rapidsai#7465, rapidsai#7059). Instead, we now elevate `device_ids` to a top-level parameter for these models. Like with `n_streams`, this better elevates this feature, and keeps our configuration and `__init__` simpler. `DeviceResourcesSNMG` is now an implementation detail, not something user-facing. **Hold other resources to share across calls** A `Handle` contains many resources (cusolver handles, cublas handles, ...), which are created lazily and may have some costs to initialize. The previous model would result in a unique `Handle` per estimator or function call by default, preventing sharing these initialization costs. Expert users may create a handle once and pass it around, but that requires extra plumbing on their part with no real other benefit. Instead, we now cache a handle per thread (some resources have concurrency or thread limitations, with python's thread-based concurrency model keeping things thread local prevents any issues on that front). For single threaded programs only a single `Handle` will now be created, letting us avoid any repeated init costs. --- Given the above arguments, `Handle` and `DeviceResourcesSNMG` are now implementation details, and are no longer user facing. To accomplish this, this PR: - Modifies all relevant estimators and functions to warn if a user specifies a `handle`. During the deprecation cycle the specified `handle` will continue to be used. The warning informs the user the parameter is deprecated, and if the estimator supports `n_streams`/`device_ids` will also include a note to use that instead. - The internal `*MG` estimators (e.g. `LogisticRegressionMG`) still do include a `handle` parameter, since for now the distributed comms are also attached to that object and need to be provided somehow by the caller. Since these are internal(ish) classes, I'm ok with keeping the `handle` parameter around on them for now. In the long run we probably will want to rethink our multi-gpu APIs since the `*MG` classes are a bit unwieldly, but no sense changing them for now. - All estimators that make use of a handle's stream pool now include an `n_streams` parameter for configuring the pool size. Users are recommended to use that by the docs and deprecation warnings. - All estimators that support multi-gpu execution now include a `device_ids` parameter for configuring the devices used. Users are recommended to use that by the docs and deprecation warnings. - Estimators or functions that don't have a handle manually specified will use a cached thread-local handle, unless `n_streams`/`device_ids` are specified. - Accessing `cuml.Handle` (a re-export of `pylibraft.common.handle.Handle`) now also raises a deprecation warning. This re-export will be removed in the following release. - Docstrings are updated to note the deprecation and inform the user of alternate parameters like `n_streams`/`device_ids`. - Tests are updated to no longer specify a `handle` (we had a few modules that parametrized across handle/no-handle). - New tests are added to test the deprecation warnings and that execution with a handle specified still works in all cases. Fixes rapidsai#6869 Fixes rapidsai#7059 Fixes rapidsai#7465 Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#7628
This removes the deprecated `handle` argument/attributes on all relevant models and functions. Note that for now the (_mostly_ private) `*MG` classes retain their `handle` argument and attribute since the multi-gpu comms are currently attached to the handles. Some care has been taken to ensure the proper handle is utilized for multi-gpu APIs. All other APIs now make use of the `get_handle` function exclusively. This is a follow-up to #7628. Fixes #7722. Authors: - Jim Crist-Harif (https://github.com/jcrist) - Simon Adorf (https://github.com/csadorf) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: #7751
This removes the deprecated `handle` argument/attributes on all relevant models and functions. Note that for now the (_mostly_ private) `*MG` classes retain their `handle` argument and attribute since the multi-gpu comms are currently attached to the handles. Some care has been taken to ensure the proper handle is utilized for multi-gpu APIs. All other APIs now make use of the `get_handle` function exclusively. This is a follow-up to rapidsai#7628. Fixes rapidsai#7722. Authors: - Jim Crist-Harif (https://github.com/jcrist) - Simon Adorf (https://github.com/csadorf) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#7751
This deprecates the
handleargument to all classes, methods, and functions.cuml.Handleis a bit of a relic, and doesn't necessarily provide a solid user-facing benefit. A few things it was purported to do:Allow for asynchronous execution
All cuml python APIs are synchronous, we do not support asynchronous execution. Any docs or claims that this is currently possible are incorrect. Further, the python apis we're mimicking and the ecosystem we're extending don't support asynchronous execution.
cupy(which we make heavy use of in cuml) does have some mechanisms for asynchronous execution - if/when we want to support asynchronous operation, we're likely to take an approach that piggybacks off their configuration (for better ecosystem support) than rely on the existingcuml.Handleclass that doesn't really match the task. Handles are for specifying and caching resources, not for configuring sync/async operation.Specify the stream of execution
This was possible with
cuml.Handle, but didn't provide a meaningful benefit.cumlapis are all synchronous, so the stream they execute on doesn't matter. Further, not every function or estimator that accepted acuml.Handlemade use of the handle (or respected the stream), making this specification kind of moot.cuml.Handleis really only used for functions part oflibcuml- anything written using cupy/cudf/etc... ignores them completely.Given we provide synchronous APIs only (and rely on threads for concurrency, matching python conventions), it doesn't make sense to expose this at the user level necessarily. Better to make it an implementation detail of APIs that rely on
libcuml.Specify the number of streams in a backing stream pool
A few of our algorithms support using multiple streams from a pool on the handle. In some cases (
cuml.ensemble) we also exposed a top-leveln_streamsargument, which seems preferable. I've added this toLinearSVC(the only other algorithm that uses this AFAICT). Elevating then_streamsto a top-level parameter makes them more discoverable by users, and also lets us avoid modifying or constructing a handle within the__init__, better following sklearn conventions.Specify a
DeviceResourcesSNMGThis is a relatively new use case currently only supported by
HDBSCANandUMAP. Some of our algorithms support running on multiple GPUs on a single node (when configured). Previously this was supported by passing in apylibraft.common.handle.DeviceResourcesSNMGinstead of apylibraft.common.handle.Handle. There were several problems with that though (see #7465, #7059). Instead, we now elevatedevice_idsto a top-level parameter for these models. Like withn_streams, this better elevates this feature, and keeps our configuration and__init__simpler.DeviceResourcesSNMGis now an implementation detail, not something user-facing.Hold other resources to share across calls
A
Handlecontains many resources (cusolver handles, cublas handles, ...), which are created lazily and may have some costs to initialize. The previous model would result in a uniqueHandleper estimator or function call by default, preventing sharing these initialization costs. Expert users may create a handle once and pass it around, but that requires extra plumbing on their part with no real other benefit. Instead, we now cache a handle per thread (some resources have concurrency or thread limitations, with python's thread-based concurrency model keeping things thread local prevents any issues on that front). For single threaded programs only a singleHandlewill now be created, letting us avoid any repeated init costs.Given the above arguments,
HandleandDeviceResourcesSNMGare now implementation details, and are no longer user facing. To accomplish this, this PR:handle. During the deprecation cycle the specifiedhandlewill continue to be used. The warning informs the user the parameter is deprecated, and if the estimator supportsn_streams/device_idswill also include a note to use that instead.*MGestimators (e.g.LogisticRegressionMG) still do include ahandleparameter, since for now the distributed comms are also attached to that object and need to be provided somehow by the caller. Since these are internal(ish) classes, I'm ok with keeping thehandleparameter around on them for now. In the long run we probably will want to rethink our multi-gpu APIs since the*MGclasses are a bit unwieldly, but no sense changing them for now.n_streamsparameter for configuring the pool size. Users are recommended to use that by the docs and deprecation warnings.device_idsparameter for configuring the devices used. Users are recommended to use that by the docs and deprecation warnings.n_streams/device_idsare specified.cuml.Handle(a re-export ofpylibraft.common.handle.Handle) now also raises a deprecation warning. This re-export will be removed in the following release.n_streams/device_ids.handle(we had a few modules that parametrized across handle/no-handle).Fixes #6869
Fixes #7059
Fixes #7465