Remove device_memory_resource inheritance from all resources and adaptors#2301
Remove device_memory_resource inheritance from all resources and adaptors#2301bdice wants to merge 7 commits intorapidsai:stagingfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
032b3a6 to
856395e
Compare
…async_resource_ref (#2300) ## Summary Replaces `shared_ptr[device_memory_resource]` with per-subclass `unique_ptr[ConcreteType]` (owning) and `optional[device_async_resource_ref]` (non-owning reference) across all Python/Cython bindings. This is a part of #2011. There are **significant** opportunities to make this Cython code better over time but I have to get something that removes `device_memory_resource` from the Python/Cython side before I can finish migration on the C++ side (#2296). I welcome critique of this design, and ideas for how it can be improved, particularly from @vyasr @wence-. I would like to address any suggested improvements in follow-up PRs, because this changeset is necessary to unblock #2301. The changes in `cdef class DeviceMemoryResource` are perhaps the most significant changes here from a design perspective. The solution I'm going with for now is to keep the `DeviceMemoryResource` class around, as a base class for the Cython MRs, and let it handle allocate/deallocate. It owns a `optional[device_async_resource_ref]` which is used for allocation/deallocation. It's `optional` so that the class can be default-constructed (Cython requires nullary constructors), but it should never be `nullopt` except during initialization. Then, each MR class owns a `c_obj` like `unique_ptr[cuda_memory_resource]`. This is `unique_ptr` so it can be default-constructed for Cython's requirements. I chose `unique_ptr` over `optional` here to emphasize that this member is the thing that actually owns the resource. As with the `c_ref`, this should never be `nullptr` except during initialization. When an MR class is created, it initializes its `c_obj` and then constructs a `c_ref` (a member inherited from the `DeviceMemoryResource` base class). "Special" methods for an MR like getting the statistics counts go through `deref(self.c_obj)`, and "common" methods like allocate/deallocate go through `self.c_ref.value()`. ### Changes - **`.pxd` declarations**: Remove `device_memory_resource` class. Declare `device_async_resource_ref` and a `make_device_async_resource_ref()` inline C++ template that returns `optional` to work around Cython generating default-constructed temporaries for non-default-constructible types. All adaptor constructors take `device_async_resource_ref` instead of `device_memory_resource*`. - **`.pxd` class definitions**: `DeviceMemoryResource` base holds `optional[device_async_resource_ref] c_ref`; each concrete subclass holds `unique_ptr[ConcreteType] c_obj`. - **`.pyx` implementations**: All `__cinit__` methods construct via `unique_ptr` then set `c_ref` via `make_device_async_resource_ref`. Typed accessors (`pool_size`, `flush`, etc.) use `deref(self.c_obj)`. Per-device functions use `set_per_device_resource_ref`. - **`device_buffer.pyx`**: Passes `self.mr.c_ref.value()` instead of `self.mr.get_mr()`. Closes #2294
…tors Remove the device_memory_resource virtual base class inheritance from all production memory resources, adaptors, and stream_ordered_memory_resource. Resources now derive publicly from cuda::mr::shared_resource<Impl> (for stateful/adaptor types) or stand alone with direct CCCL concept methods (for stateless types). The legacy do_allocate/do_deallocate/do_is_equal virtual overrides and pointer-based per-device-resource APIs are removed. stream_ordered_memory_resource provides allocate/deallocate/allocate_sync/ deallocate_sync directly instead of through the DMR virtual dispatch. All 103 C++ tests and 1165 Python tests pass.
856395e to
a06396a
Compare
| ::rmm::detail::cccl_resource_ref> and | ||
| not is_specialization_of_v<std::remove_cv_t<OtherResourceType>, | ||
| ::rmm::detail::cccl_async_resource_ref> and | ||
| not shared_resource_cast<OtherResourceType>::value and |
There was a problem hiding this comment.
This isn't something we want to keep. I don't know how to work around the constraint satisfaction issues in NVIDIA/cccl#8037.
Minimal repro: https://godbolt.org/z/ddfn1sE8Y
There was a problem hiding this comment.
Reviewers: I discussed this issue with @ericniebler. We are going to try NVIDIA/cccl#8121 as a workaround. For now, I want to get the current series of open PRs merged (#2301, #2324, #2325), and then try to reduce the workarounds that have been introduced.
| void* ptr{nullptr}; | ||
| if (i != 0) { std::this_thread::sleep_for(std::chrono::milliseconds{100}); } | ||
| EXPECT_NO_THROW(ptr = mr.allocate(stream, allocation_size)); | ||
| EXPECT_NO_THROW(ptr = mr.allocate(stream, allocation_size, rmm::CUDA_ALLOCATION_ALIGNMENT)); |
There was a problem hiding this comment.
An explicit alignment argument is required because of the issues described in NVIDIA/cccl#8063. There is no 2-argument allocate(stream, size) overload for shared_resource<T> like there is for other CCCL MR classes. There is only allocation(stream, size, alignment) for shared_resource<T>, and it has no default alignment value.
Rewrite benchmark factory functions from shared_ptr<device_memory_resource> to any_device_resource, convert simulated_memory_resource from DMR inheritance to CCCL concepts, and change copy/move from = delete to = default on cuda_async_memory_resource, cuda_async_managed_memory_resource, sam_headroom_memory_resource, and simulated_memory_resource to satisfy CCCL resource_ref copyability requirements.
563d7d3 to
8385650
Compare
GCC 13 incorrectly resolves the injected class name to the fully-specialized type inside a class template body, even in template-template parameter position. This causes a hard error (not SFINAE) in is_specialization_of_v when checking against cccl_resource_ref and cccl_async_resource_ref in their own generic-resource constructors. Replace is_specialization_of_v<T, cccl_*_resource_ref> with dedicated is_cccl_*_resource_ref_v<T> traits defined outside the class bodies, where the forward declarations are unambiguously templates. This is correct on both GCC 13 and 14.
Summary
device_memory_resourceinheritance from all memory resources (stateless, stateful, and adaptors)do_allocate/do_deallocate/do_is_equalvirtual overrides from all resourcesshared_ptr<device_memory_resource>toany_device_resourcesimulated_memory_resourcefrom DMR inheritance to CCCL concepts= deleteto= defaultoncuda_async_memory_resource,cuda_async_managed_memory_resource,sam_headroom_memory_resource, andsimulated_memory_resource(required for CCCLresource_refcopyability viashared_resourcebase)Closes #2295
Part of #2011