Cherry-pick "Remove PoolMemoryResource wrapper" into release#7881
Cherry-pick "Remove PoolMemoryResource wrapper" into release#7881rapids-bot[bot] merged 2 commits intorapidsai:release/26.04from
PoolMemoryResource wrapper" into release#7881Conversation
I'm seeing some OOMs in CI after the `PoolMemoryResource` was added. Trying removing it to see if that fixes it. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#7878
📝 WalkthroughSummary by CodeRabbit
WalkthroughThe pull request simplifies memory resource handling in the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can get early access to new features in CodeRabbit.Enable the |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
python/cuml/cuml/accel/core.py (1)
184-205:⚠️ Potential issue | 🟡 MinorAdd regression coverage for the new managed-memory chain.
This changes both the detection predicate and the resource graph that
install()installs, but the PR does not update tests. Please add coverage for at least the no-op path when the current resource is alreadyPrefetchResourceAdaptor(ManagedMemoryResource())and the reconfiguration path from the default CUDA resource, so this backport does not silently drift back to the old wrapper shape.As per coding guidelines, "Update unit tests when making code changes".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/cuml/cuml/accel/core.py` around lines 184 - 205, Add unit tests exercising the managed-memory install paths: (1) a no-op path where rmm.mr.get_current_device_resource() returns a PrefetchResourceAdaptor whose upstream_mr is a ManagedMemoryResource and calling the install routine (the code in core.py that checks _is_concurrent_managed_access_supported() and manipulates rmm) leaves the resource unchanged; and (2) the reconfiguration path where get_current_device_resource() returns a CudaMemoryResource and the install routine sets the current device resource to a PrefetchResourceAdaptor(ManagedMemoryResource()). Use pytest and monkeypatch to stub rmm.mr.get_current_device_resource and rmm.mr.set_current_device_resource (or inspect the value passed to set_current_device_resource), call the module function containing this diff (the install logic that invokes _is_concurrent_managed_access_supported(), rmm.mr.get_current_device_resource, and rmm.mr.set_current_device_resource), and assert the expected behaviors (no call to set_current_device_resource for the no-op case and a call with a PrefetchResourceAdaptor wrapping a ManagedMemoryResource for the reconfigure case).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@python/cuml/cuml/accel/core.py`:
- Around line 184-205: Add unit tests exercising the managed-memory install
paths: (1) a no-op path where rmm.mr.get_current_device_resource() returns a
PrefetchResourceAdaptor whose upstream_mr is a ManagedMemoryResource and calling
the install routine (the code in core.py that checks
_is_concurrent_managed_access_supported() and manipulates rmm) leaves the
resource unchanged; and (2) the reconfiguration path where
get_current_device_resource() returns a CudaMemoryResource and the install
routine sets the current device resource to a
PrefetchResourceAdaptor(ManagedMemoryResource()). Use pytest and monkeypatch to
stub rmm.mr.get_current_device_resource and rmm.mr.set_current_device_resource
(or inspect the value passed to set_current_device_resource), call the module
function containing this diff (the install logic that invokes
_is_concurrent_managed_access_supported(), rmm.mr.get_current_device_resource,
and rmm.mr.set_current_device_resource), and assert the expected behaviors (no
call to set_current_device_resource for the no-op case and a call with a
PrefetchResourceAdaptor wrapping a ManagedMemoryResource for the reconfigure
case).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3ce03ed0-4e2e-498f-b395-b41640db2d6a
📒 Files selected for processing (1)
python/cuml/cuml/accel/core.py
|
/merge |
868c4cf
into
rapidsai:release/26.04
Cherry-pick of #7878 into the release branch for 26.04.