Skip to content

Remove device_memory_resource inheritance from all resources and adaptors#2301

Open
bdice wants to merge 7 commits intorapidsai:stagingfrom
bdice:remove-dmr-inheritance
Open

Remove device_memory_resource inheritance from all resources and adaptors#2301
bdice wants to merge 7 commits intorapidsai:stagingfrom
bdice:remove-dmr-inheritance

Conversation

@bdice
Copy link
Collaborator

@bdice bdice commented Mar 14, 2026

Summary

  • Remove device_memory_resource inheritance from all memory resources (stateless, stateful, and adaptors)
  • Remove do_allocate / do_deallocate / do_is_equal virtual overrides from all resources
  • Rewrite benchmark factory functions from shared_ptr<device_memory_resource> to any_device_resource
  • Convert simulated_memory_resource from DMR inheritance to CCCL concepts
  • Change copy/move from = delete to = default on cuda_async_memory_resource, cuda_async_managed_memory_resource, sam_headroom_memory_resource, and simulated_memory_resource (required for CCCL resource_ref copyability via shared_resource base)
  • Remove NullUpstream tests and DEVICE_MEMORY_RESOURCE_VIEW_TEST (no longer needed without DMR)

Closes #2295
Part of #2011

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 14, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@bdice bdice force-pushed the remove-dmr-inheritance branch from 032b3a6 to 856395e Compare March 17, 2026 08:16
bdice added a commit that referenced this pull request Mar 17, 2026
…async_resource_ref (#2300)

## Summary

Replaces `shared_ptr[device_memory_resource]` with per-subclass
`unique_ptr[ConcreteType]` (owning) and
`optional[device_async_resource_ref]` (non-owning reference) across all
Python/Cython bindings. This is a part of #2011.

There are **significant** opportunities to make this Cython code better
over time but I have to get something that removes
`device_memory_resource` from the Python/Cython side before I can finish
migration on the C++ side (#2296). I welcome critique of this design,
and ideas for how it can be improved, particularly from @vyasr @wence-.
I would like to address any suggested improvements in follow-up PRs,
because this changeset is necessary to unblock #2301.

The changes in `cdef class DeviceMemoryResource` are perhaps the most
significant changes here from a design perspective.

The solution I'm going with for now is to keep the
`DeviceMemoryResource` class around, as a base class for the Cython MRs,
and let it handle allocate/deallocate. It owns a
`optional[device_async_resource_ref]` which is used for
allocation/deallocation. It's `optional` so that the class can be
default-constructed (Cython requires nullary constructors), but it
should never be `nullopt` except during initialization.

Then, each MR class owns a `c_obj` like
`unique_ptr[cuda_memory_resource]`. This is `unique_ptr` so it can be
default-constructed for Cython's requirements. I chose `unique_ptr` over
`optional` here to emphasize that this member is the thing that actually
owns the resource. As with the `c_ref`, this should never be `nullptr`
except during initialization. When an MR class is created, it
initializes its `c_obj` and then constructs a `c_ref` (a member
inherited from the `DeviceMemoryResource` base class).

"Special" methods for an MR like getting the statistics counts go
through `deref(self.c_obj)`, and "common" methods like
allocate/deallocate go through `self.c_ref.value()`.

### Changes

- **`.pxd` declarations**: Remove `device_memory_resource` class.
Declare `device_async_resource_ref` and a
`make_device_async_resource_ref()` inline C++ template that returns
`optional` to work around Cython generating default-constructed
temporaries for non-default-constructible types. All adaptor
constructors take `device_async_resource_ref` instead of
`device_memory_resource*`.
- **`.pxd` class definitions**: `DeviceMemoryResource` base holds
`optional[device_async_resource_ref] c_ref`; each concrete subclass
holds `unique_ptr[ConcreteType] c_obj`.
- **`.pyx` implementations**: All `__cinit__` methods construct via
`unique_ptr` then set `c_ref` via `make_device_async_resource_ref`.
Typed accessors (`pool_size`, `flush`, etc.) use `deref(self.c_obj)`.
Per-device functions use `set_per_device_resource_ref`.
- **`device_buffer.pyx`**: Passes `self.mr.c_ref.value()` instead of
`self.mr.get_mr()`.

Closes #2294
…tors

Remove the device_memory_resource virtual base class inheritance from all
production memory resources, adaptors, and stream_ordered_memory_resource.

Resources now derive publicly from cuda::mr::shared_resource<Impl> (for
stateful/adaptor types) or stand alone with direct CCCL concept methods
(for stateless types). The legacy do_allocate/do_deallocate/do_is_equal
virtual overrides and pointer-based per-device-resource APIs are removed.

stream_ordered_memory_resource provides allocate/deallocate/allocate_sync/
deallocate_sync directly instead of through the DMR virtual dispatch.

All 103 C++ tests and 1165 Python tests pass.
@bdice bdice force-pushed the remove-dmr-inheritance branch from 856395e to a06396a Compare March 17, 2026 23:46
@bdice bdice moved this to In Progress in RMM Project Board Mar 18, 2026
@bdice bdice self-assigned this Mar 18, 2026
@bdice bdice added breaking Breaking change improvement Improvement / enhancement to an existing function labels Mar 18, 2026
@bdice bdice marked this pull request as ready for review March 18, 2026 00:01
@bdice bdice requested review from a team as code owners March 18, 2026 00:01
@bdice bdice requested review from ttnghia and vyasr and removed request for a team March 18, 2026 00:01
::rmm::detail::cccl_resource_ref> and
not is_specialization_of_v<std::remove_cv_t<OtherResourceType>,
::rmm::detail::cccl_async_resource_ref> and
not shared_resource_cast<OtherResourceType>::value and
Copy link
Collaborator Author

@bdice bdice Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't something we want to keep. I don't know how to work around the constraint satisfaction issues in NVIDIA/cccl#8037.

Minimal repro: https://godbolt.org/z/ddfn1sE8Y

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewers: I discussed this issue with @ericniebler. We are going to try NVIDIA/cccl#8121 as a workaround. For now, I want to get the current series of open PRs merged (#2301, #2324, #2325), and then try to reduce the workarounds that have been introduced.

void* ptr{nullptr};
if (i != 0) { std::this_thread::sleep_for(std::chrono::milliseconds{100}); }
EXPECT_NO_THROW(ptr = mr.allocate(stream, allocation_size));
EXPECT_NO_THROW(ptr = mr.allocate(stream, allocation_size, rmm::CUDA_ALLOCATION_ALIGNMENT));
Copy link
Collaborator Author

@bdice bdice Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An explicit alignment argument is required because of the issues described in NVIDIA/cccl#8063. There is no 2-argument allocate(stream, size) overload for shared_resource<T> like there is for other CCCL MR classes. There is only allocation(stream, size, alignment) for shared_resource<T>, and it has no default alignment value.

@bdice bdice requested a review from a team as a code owner March 19, 2026 18:11
Rewrite benchmark factory functions from shared_ptr<device_memory_resource>
to any_device_resource, convert simulated_memory_resource from DMR
inheritance to CCCL concepts, and change copy/move from = delete to
= default on cuda_async_memory_resource, cuda_async_managed_memory_resource,
sam_headroom_memory_resource, and simulated_memory_resource to satisfy
CCCL resource_ref copyability requirements.
GCC 13 incorrectly resolves the injected class name to the
fully-specialized type inside a class template body, even in
template-template parameter position. This causes a hard error
(not SFINAE) in is_specialization_of_v when checking against
cccl_resource_ref and cccl_async_resource_ref in their own
generic-resource constructors.

Replace is_specialization_of_v<T, cccl_*_resource_ref> with
dedicated is_cccl_*_resource_ref_v<T> traits defined outside
the class bodies, where the forward declarations are unambiguously
templates. This is correct on both GCC 13 and 14.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change improvement Improvement / enhancement to an existing function

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants