Add experimental cuda_async_managed_memory_resource. by bdice · Pull Request #2056 · rapidsai/rmm

bdice · 2025-09-27T01:45:43Z

Description

Contributes to #2054.

Some follow-up tasks (after this PR):

Decide whether to reimplement the existing managed_memory_resource on CUDA 13 with this?
Determine whether decompression engine flags work, if provided (if so, should we use the default pool?)
Determine whether we need to implement a release threshold argument, or provide docs on how to set that.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

bdice · 2025-09-30T15:10:28Z

There is parallel design work happening in NVIDIA/cccl#5998. I don't want to offer this as a stable API in RMM given that we are fairly close to being able to use CCCL's MR implementations directly. I will mark this as experimental.

…managed_memory_resource

TomAugspurger

Python changes look good, though I had one small question.

python/rmm/rmm/pylibrmm/memory_resource/__init__.py

bdice · 2025-10-07T16:56:32Z

cpp/include/rmm/mr/device/cuda_async_managed_memory_resource.hpp

+    location.type              = cudaMemLocationTypeDevice;
+    location.id                = rmm::get_current_cuda_device().value();
+    cudaMemAllocationType type = cudaMemAllocationTypeManaged;
+    RMM_CUDA_TRY(cudaMemGetDefaultMemPool(&managed_pool_handle, &location, type));


Should this use cudaMemGetMemPool (the current memory pool for this device) instead of cudaMemGetDefaultMemPool (the default memory pool for this device)?

CCCL is using the default memory pool so we should probably match. https://github.com/NVIDIA/cccl/blob/de213a108b12aa5fdd4b7c8889aec4120734b4f1/cudax/include/cuda/experimental/__memory_resource/managed_memory_resource.cuh#L63

python/rmm/rmm/pylibrmm/memory_resource/__init__.py

vyasr · 2025-10-08T22:23:49Z

cpp/include/rmm/detail/runtime_capabilities.hpp

+ * @brief Determine at runtime if the CUDA driver/runtime supports the stream-ordered
+ * managed memory allocator functions.
+ *
+ * Stream-ordered managed memory pools were introduced in CUDA 13.0.


Yes, using cudart>=13 on a 12.9 driver is forward compatibility mode. If a user did not have the cuda-compat package installed in this scenario then everything would have failed already. By not checking the driver version we are effectively implicitly assuming that RMM_MIN_ASYNC_MANAGED_ALLOC_CUDA_VERSION is 13.0, though, since otherwise if the feature was introduced in e.g. 13.1 or 13.2 we could be missing user-mode driver support as well if you had the 13.0 compat driver.

Minor note, there are also edge cases where forward compatibility is not sufficient. I don't see any documentation indicating that async managed allocations are one of them, though.

cpp/include/rmm/mr/device/cuda_async_managed_memory_resource.hpp

python/rmm/rmm/pylibrmm/memory_resource/__init__.py

…managed_memory_resource

…/rmm into cuda_async_managed_memory_resource

…mory resources (#2083) Precursor to #2056. This refactors the Python/Cython code for memory resources to make it easier to add a new `experimental` namespace. This is a small breaking change in the Cython API, for some Cython names that were unintentionally exposed in multiple modules. [For example](rapidsai/rapidsmpf#575), `device_memory_resource` should be cimported from `rmm.librmm.memory_resource` rather than `rmm.pylibrmm.memory_resource`. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #2083

…managed_memory_resource

bdice · 2025-10-13T04:42:46Z

@wence- @vyasr I have responded to all the threads above and requested another round of review. I'd like to finish this up and get something merged so it can be more readily tested. It is experimental so I am happy to break things if we decide on a better plan for implementation later on. My current hope is that this feature is never stabilized, and is replaced entirely by the CCCL implementations once they are available.

bdice · 2025-10-13T20:59:44Z

/merge

Adds a new cuda_async_pinned_memory_resource that provides stream-ordered pinned (page-locked) host memory allocation using CUDA 13.0's cudaMemGetDefaultMemPool API with cudaMemAllocationTypePinned. This parallels the cuda_async_managed_memory_resource added in rapidsai#2056 and addresses part of rapidsai#2054. Key features: - Uses default pinned memory pool for stream-ordered allocation - Accessible from both host and device - Requires CUDA 13.0+ (matches managed version for consistency) - Simpler requirements than managed (no concurrent managed access needed) - Works on WSL2 and other systems where managed memory is not supported Implementation includes: - C++ header and implementation in cuda_async_pinned_memory_resource.hpp - Runtime capability check in runtime_capabilities.hpp - C++ tests in cuda_async_pinned_mr_tests.cpp - Python bindings in experimental module - Python tests in test_cuda_async_pinned_memory_resource.py

Adds a new cuda_async_pinned_memory_resource that provides stream-ordered pinned (page-locked) host memory allocation using CUDA 13.0's cudaMemGetDefaultMemPool API with cudaMemAllocationTypePinned. This parallels the cuda_async_managed_memory_resource added in rapidsai#2056 and addresses part of rapidsai#2054. Key features: - Uses default pinned memory pool for stream-ordered allocation - Accessible from both host and device - Requires CUDA 13.0+ (matches managed version for consistency) Implementation includes: - C++ header and implementation in cuda_async_pinned_memory_resource.hpp - Runtime capability check in runtime_capabilities.hpp - C++ tests in cuda_async_pinned_mr_tests.cpp - Python bindings in experimental module - Python tests in test_cuda_async_pinned_memory_resource.py

Add cuda_async_managed_memory_resource.

31bb592

github-project-automation bot added this to RMM Project Board Sep 27, 2025

bdice added 6 commits October 6, 2025 09:35

Merge remote-tracking branch 'upstream/branch-25.12' into cuda_async_…

ecd9a4f

…managed_memory_resource

Add Python bindings

f72b390

Use experimental namespace for Python

f46f514

Restructure Cython to allow experimental namespace in Cython, too

ec642ed

Add docs

6b7b6bb

Update docs

e88991e

bdice self-assigned this Oct 6, 2025

bdice added feature request New feature or request non-breaking Non-breaking change labels Oct 6, 2025

bdice added 3 commits October 6, 2025 15:23

Check runtime instead of driver version

bd82910

Disable tests on CUDA < 13.0

d40c71d

Merge remote-tracking branch 'upstream/branch-25.12' into cuda_async_…

b13cedb

…managed_memory_resource

bdice marked this pull request as ready for review October 7, 2025 04:00

bdice requested review from a team as code owners October 7, 2025 04:00

bdice requested review from ttnghia and vyasr October 7, 2025 04:00

rapidsai deleted a comment from copy-pr-bot bot Oct 7, 2025

bdice changed the title ~~Add cuda_async_managed_memory_resource.~~ Add experimental cuda_async_managed_memory_resource. Oct 7, 2025

TomAugspurger approved these changes Oct 7, 2025

View reviewed changes

python/rmm/rmm/pylibrmm/memory_resource/__init__.py Outdated Show resolved Hide resolved

bdice added 3 commits October 7, 2025 12:22

Cleanup

682b426

Simplify allocation.

11c9764

Lower requirements for CUDA_ASYNC_MANAGED_MR_TEST

287c190

bdice commented Oct 7, 2025

View reviewed changes

bdice added 2 commits October 8, 2025 12:06

Remove private members

1c2732b

Merge branch 'branch-25.12' into cuda_async_managed_memory_resource

d94b6c7

vyasr reviewed Oct 8, 2025

View reviewed changes

bdice added 10 commits October 8, 2025 17:51

Fix _initialize and _flush_logs imports

37cb66c

Check both driver and runtime are >=13000

e9aa431

Clarify error messages

307a01f

Move static checks to the header

9a0644a

Add more static checks

76f2259

Require concurrent_managed_access, too (not supported on WSL)

85a96ca

Merge remote-tracking branch 'upstream/branch-25.12' into cuda_async_…

b199bf3

…managed_memory_resource

Merge branch 'cuda_async_managed_memory_resource' of github.com:bdice…

489e796

…/rmm into cuda_async_managed_memory_resource

Add noexcept

e550e95

Add more tests

400d075

bdice mentioned this pull request Oct 10, 2025

Refactor memory_resource Python/Cython to prepare for experimental memory resources #2083

Merged

3 tasks

bdice added 3 commits October 10, 2025 15:21

Merge remote-tracking branch 'upstream/branch-25.12' into cuda_async_…

b90038a

…managed_memory_resource

Add comments about concurrent managed access.

e1d3d20

Require concurrent managed access for async managed tests

7383a9a

bdice mentioned this pull request Oct 13, 2025

Investigate host_accessible property for managed MRs #2088

Open

bdice requested review from vyasr and wence- October 13, 2025 04:42

wence- approved these changes Oct 13, 2025

View reviewed changes

vyasr approved these changes Oct 13, 2025

View reviewed changes

rapids-bot bot merged commit 82cc74e into rapidsai:branch-25.12 Oct 13, 2025
78 checks passed

github-project-automation bot moved this from Review to Done in RMM Project Board Oct 13, 2025

bdice mentioned this pull request Oct 17, 2025

Implement CUDA Stream Protocol for rmm.pylibrmm.stream.Stream #2094

Merged

bdice mentioned this pull request Nov 25, 2025

Add experimental cuda_async_pinned_memory_resource #2164

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experimental cuda_async_managed_memory_resource.#2056

Add experimental cuda_async_managed_memory_resource.#2056
rapids-bot[bot] merged 31 commits intorapidsai:branch-25.12from
bdice:cuda_async_managed_memory_resource

bdice commented Sep 27, 2025 •

edited

Loading

Uh oh!

bdice commented Sep 30, 2025

Uh oh!

TomAugspurger left a comment

Uh oh!

Uh oh!

bdice Oct 7, 2025

Uh oh!

Uh oh!

vyasr Oct 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bdice commented Oct 13, 2025 •

edited

Loading

Uh oh!

bdice commented Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

bdice commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

bdice commented Sep 30, 2025

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bdice Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vyasr Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bdice commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdice commented Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bdice commented Sep 27, 2025 •

edited

Loading

bdice commented Oct 13, 2025 •

edited

Loading