Delete cccl_adaptors.hpp and use raw CCCL resource_ref types#2325
Draft
bdice wants to merge 9 commits intorapidsai:stagingfrom
Draft
Delete cccl_adaptors.hpp and use raw CCCL resource_ref types#2325bdice wants to merge 9 commits intorapidsai:stagingfrom
bdice wants to merge 9 commits intorapidsai:stagingfrom
Conversation
…tors Remove the device_memory_resource virtual base class inheritance from all production memory resources, adaptors, and stream_ordered_memory_resource. Resources now derive publicly from cuda::mr::shared_resource<Impl> (for stateful/adaptor types) or stand alone with direct CCCL concept methods (for stateless types). The legacy do_allocate/do_deallocate/do_is_equal virtual overrides and pointer-based per-device-resource APIs are removed. stream_ordered_memory_resource provides allocate/deallocate/allocate_sync/ deallocate_sync directly instead of through the DMR virtual dispatch. All 103 C++ tests and 1165 Python tests pass.
Rewrite benchmark factory functions from shared_ptr<device_memory_resource> to any_device_resource, convert simulated_memory_resource from DMR inheritance to CCCL concepts, and change copy/move from = delete to = default on cuda_async_memory_resource, cuda_async_managed_memory_resource, sam_headroom_memory_resource, and simulated_memory_resource to satisfy CCCL resource_ref copyability requirements.
Delete device_memory_resource.hpp and device_memory_resource_view.hpp.
Strip DMR bridge code from cccl_adaptors.hpp, keeping shared_resource_cast
wrappers. Inline do_allocate/do_deallocate into allocate/deallocate in
stream_ordered_memory_resource. Convert benchmarks from shared_ptr<DMR> to
any_device_resource. Rewrite test mocks to satisfy CCCL concepts directly,
with copyable forwarding wrappers to work around basic_any type-erasure
limitations with GMock types. Replace reinterpret_cast stream constructions
with cuda_stream_view{}.
…onstraint cycle Replace device_async_resource_ref constructor parameters with cuda::mr::any_resource<device_accessible> across all adaptor impl classes. Add template constructors (constrained with !is_same_v<decay_t<T>, AdaptorType>) to public adaptor headers for single-arg-capable constructors, breaking the recursive is_constructible cycle that CCCL #8037 causes. Multi-arg constructors that cannot be confused with copy/move use plain any_resource params with out-of-line definitions. Update Python/Cython bindings with any_device_resource type alias and _to_any_resource() wrapper at all call sites to work around Cython's inability to call C++ template constructors directly.
…urce_adaptor to store any_resource members Replace device_async_resource_ref members with cuda::mr::any_resource<device_accessible> in polymorphic_allocator, thrust_allocator, and device_check_resource_adaptor. This eliminates the CCCL #8037 recursive constraint cycle for these classes. polymorphic_allocator uses a template constructor with SFINAE (is_polymorphic_allocator_v) because it is a class template with a 1-arg constructor, matching the pattern used by the adaptor classes.
Replace RMM's wrapper types (cccl_resource_ref, cccl_async_resource_ref)
with direct aliases to CCCL's resource_ref and synchronous_resource_ref.
This eliminates the 469-line adaptor layer that was originally needed to
work around shared_resource type-erasure issues.
The wrapper was no longer needed once the CCCL #8037 recursive constraint
cycle was broken via template SFINAE constructors (previous commit).
Additional changes required for compilation without the wrapper:
- per_device_resource: static_cast<any_device_resource>(ref) replaced
with any_device_resource{ref} (wrapper had operator any_resource)
- cuda_async_memory_resource, cuda_async_managed_memory_resource,
sam_headroom_memory_resource: copy/move changed from = delete to
= default (CCCL resource_ref requires copyability; shared_resource
base already provides correct reference-counted semantics)
- device_check_resource_adaptor (test): template SFINAE constructor
to break the same CCCL #8037 cycle
…hmark The any_resource internals use __host__ __device__ functions that call shared_resource copy/move constructors which are __host__ only. This is safe because the benchmark factory functions are only called from host code.
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
This was referenced Mar 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
polymorphic_allocator,thrust_allocator, anddevice_check_resource_adaptorto storeany_resourcemembers instead of non-owningresource_refcccl_adaptors.hppand replace RMM'sresource_reftype aliases with direct CCCL typesmulti_stream_allocations_bench.cuCloses #2323
Part of #2011