Skip to content

Cherry-pick "Remove PoolMemoryResource wrapper" into release#7881

Merged
rapids-bot[bot] merged 2 commits intorapidsai:release/26.04from
jcrist:backport-rmm-fix
Mar 12, 2026
Merged

Cherry-pick "Remove PoolMemoryResource wrapper" into release#7881
rapids-bot[bot] merged 2 commits intorapidsai:release/26.04from
jcrist:backport-rmm-fix

Conversation

@jcrist
Copy link
Copy Markdown
Member

@jcrist jcrist commented Mar 12, 2026

Cherry-pick of #7878 into the release branch for 26.04.

I'm seeing some OOMs in CI after the `PoolMemoryResource` was added. Trying removing it to see if that fixes it.

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#7878
@jcrist jcrist self-assigned this Mar 12, 2026
@jcrist jcrist requested a review from a team as a code owner March 12, 2026 20:25
@jcrist jcrist requested a review from csadorf March 12, 2026 20:25
@jcrist jcrist added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change cuml-accel Issues related to cuml.accel labels Mar 12, 2026
@github-actions github-actions Bot added the Cython / Python Cython or Python issue label Mar 12, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 12, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Refactor
    • Simplified internal memory resource initialization and validation logic, resulting in more streamlined configuration handling and improved code maintainability.

Walkthrough

The pull request simplifies memory resource handling in the install() function by removing the nested PoolMemoryResource wrapper and directly using ManagedMemoryResource with PrefetchResourceAdaptor. This reduces the required memory resource chain from two levels to one, while maintaining fallback and logging behavior.

Changes

Cohort / File(s) Summary
Memory Resource Simplification
python/cuml/cuml/accel/core.py
Simplified memory resource checks in install() by removing PoolMemoryResource nesting and directly constructing PrefetchResourceAdaptor with ManagedMemoryResource as upstream. Adjusted validation logic to match the simplified chain.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested reviewers

  • dantegd
  • bdice
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the cherry-pick action and references the removal of PoolMemoryResource wrapper, which directly matches the changeset.
Description check ✅ Passed The description directly relates to the changeset by explaining this is a cherry-pick of PR #7878 that removes the PoolMemoryResource wrapper.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can get early access to new features in CodeRabbit.

Enable the early_access setting to enable early access features such as new models, tools, and more.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/cuml/cuml/accel/core.py (1)

184-205: ⚠️ Potential issue | 🟡 Minor

Add regression coverage for the new managed-memory chain.

This changes both the detection predicate and the resource graph that install() installs, but the PR does not update tests. Please add coverage for at least the no-op path when the current resource is already PrefetchResourceAdaptor(ManagedMemoryResource()) and the reconfiguration path from the default CUDA resource, so this backport does not silently drift back to the old wrapper shape.

As per coding guidelines, "Update unit tests when making code changes".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/cuml/cuml/accel/core.py` around lines 184 - 205, Add unit tests
exercising the managed-memory install paths: (1) a no-op path where
rmm.mr.get_current_device_resource() returns a PrefetchResourceAdaptor whose
upstream_mr is a ManagedMemoryResource and calling the install routine (the code
in core.py that checks _is_concurrent_managed_access_supported() and manipulates
rmm) leaves the resource unchanged; and (2) the reconfiguration path where
get_current_device_resource() returns a CudaMemoryResource and the install
routine sets the current device resource to a
PrefetchResourceAdaptor(ManagedMemoryResource()). Use pytest and monkeypatch to
stub rmm.mr.get_current_device_resource and rmm.mr.set_current_device_resource
(or inspect the value passed to set_current_device_resource), call the module
function containing this diff (the install logic that invokes
_is_concurrent_managed_access_supported(), rmm.mr.get_current_device_resource,
and rmm.mr.set_current_device_resource), and assert the expected behaviors (no
call to set_current_device_resource for the no-op case and a call with a
PrefetchResourceAdaptor wrapping a ManagedMemoryResource for the reconfigure
case).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@python/cuml/cuml/accel/core.py`:
- Around line 184-205: Add unit tests exercising the managed-memory install
paths: (1) a no-op path where rmm.mr.get_current_device_resource() returns a
PrefetchResourceAdaptor whose upstream_mr is a ManagedMemoryResource and calling
the install routine (the code in core.py that checks
_is_concurrent_managed_access_supported() and manipulates rmm) leaves the
resource unchanged; and (2) the reconfiguration path where
get_current_device_resource() returns a CudaMemoryResource and the install
routine sets the current device resource to a
PrefetchResourceAdaptor(ManagedMemoryResource()). Use pytest and monkeypatch to
stub rmm.mr.get_current_device_resource and rmm.mr.set_current_device_resource
(or inspect the value passed to set_current_device_resource), call the module
function containing this diff (the install logic that invokes
_is_concurrent_managed_access_supported(), rmm.mr.get_current_device_resource,
and rmm.mr.set_current_device_resource), and assert the expected behaviors (no
call to set_current_device_resource for the no-op case and a call with a
PrefetchResourceAdaptor wrapping a ManagedMemoryResource for the reconfigure
case).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3ce03ed0-4e2e-498f-b395-b41640db2d6a

📥 Commits

Reviewing files that changed from the base of the PR and between 35fc9e3 and 8e567d4.

📒 Files selected for processing (1)
  • python/cuml/cuml/accel/core.py

@jcrist jcrist mentioned this pull request Mar 12, 2026
@jcrist
Copy link
Copy Markdown
Member Author

jcrist commented Mar 12, 2026

/merge

@rapids-bot rapids-bot Bot merged commit 868c4cf into rapidsai:release/26.04 Mar 12, 2026
167 of 170 checks passed
@jcrist jcrist deleted the backport-rmm-fix branch March 12, 2026 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuml-accel Issues related to cuml.accel Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants