Skip to content

Delete cccl_adaptors.hpp and use raw CCCL resource_ref types#2325

Draft
bdice wants to merge 9 commits intorapidsai:stagingfrom
bdice:delete-cccl-adaptors
Draft

Delete cccl_adaptors.hpp and use raw CCCL resource_ref types#2325
bdice wants to merge 9 commits intorapidsai:stagingfrom
bdice:delete-cccl-adaptors

Conversation

@bdice
Copy link
Collaborator

@bdice bdice commented Mar 20, 2026

⚠️ This PR builds on top of #2324. It should be reviewed and merged after that PR lands.

Summary

  • Add SFINAE to 1-arg constructors on all memory resources to work around CCCL #8037 recursive constraint cycle (49 files)
  • Change polymorphic_allocator, thrust_allocator, and device_check_resource_adaptor to store any_resource members instead of non-owning resource_ref
  • Delete cccl_adaptors.hpp and replace RMM's resource_ref type aliases with direct CCCL types
  • Suppress nvcc host/device diagnostic in multi_stream_allocations_bench.cu

Closes #2323
Part of #2011

bdice added 9 commits March 17, 2026 23:44
…tors

Remove the device_memory_resource virtual base class inheritance from all
production memory resources, adaptors, and stream_ordered_memory_resource.

Resources now derive publicly from cuda::mr::shared_resource<Impl> (for
stateful/adaptor types) or stand alone with direct CCCL concept methods
(for stateless types). The legacy do_allocate/do_deallocate/do_is_equal
virtual overrides and pointer-based per-device-resource APIs are removed.

stream_ordered_memory_resource provides allocate/deallocate/allocate_sync/
deallocate_sync directly instead of through the DMR virtual dispatch.

All 103 C++ tests and 1165 Python tests pass.
Rewrite benchmark factory functions from shared_ptr<device_memory_resource>
to any_device_resource, convert simulated_memory_resource from DMR
inheritance to CCCL concepts, and change copy/move from = delete to
= default on cuda_async_memory_resource, cuda_async_managed_memory_resource,
sam_headroom_memory_resource, and simulated_memory_resource to satisfy
CCCL resource_ref copyability requirements.
Delete device_memory_resource.hpp and device_memory_resource_view.hpp.
Strip DMR bridge code from cccl_adaptors.hpp, keeping shared_resource_cast
wrappers. Inline do_allocate/do_deallocate into allocate/deallocate in
stream_ordered_memory_resource. Convert benchmarks from shared_ptr<DMR> to
any_device_resource. Rewrite test mocks to satisfy CCCL concepts directly,
with copyable forwarding wrappers to work around basic_any type-erasure
limitations with GMock types. Replace reinterpret_cast stream constructions
with cuda_stream_view{}.
…onstraint cycle

Replace device_async_resource_ref constructor parameters with
cuda::mr::any_resource<device_accessible> across all adaptor impl
classes. Add template constructors (constrained with
!is_same_v<decay_t<T>, AdaptorType>) to public adaptor headers for
single-arg-capable constructors, breaking the recursive
is_constructible cycle that CCCL #8037 causes. Multi-arg constructors
that cannot be confused with copy/move use plain any_resource params
with out-of-line definitions.

Update Python/Cython bindings with any_device_resource type alias and
_to_any_resource() wrapper at all call sites to work around Cython's
inability to call C++ template constructors directly.
…urce_adaptor to store any_resource members

Replace device_async_resource_ref members with
cuda::mr::any_resource<device_accessible> in polymorphic_allocator,
thrust_allocator, and device_check_resource_adaptor. This eliminates
the CCCL #8037 recursive constraint cycle for these classes.

polymorphic_allocator uses a template constructor with SFINAE
(is_polymorphic_allocator_v) because it is a class template with a
1-arg constructor, matching the pattern used by the adaptor classes.
Replace RMM's wrapper types (cccl_resource_ref, cccl_async_resource_ref)
with direct aliases to CCCL's resource_ref and synchronous_resource_ref.
This eliminates the 469-line adaptor layer that was originally needed to
work around shared_resource type-erasure issues.

The wrapper was no longer needed once the CCCL #8037 recursive constraint
cycle was broken via template SFINAE constructors (previous commit).

Additional changes required for compilation without the wrapper:
- per_device_resource: static_cast<any_device_resource>(ref) replaced
  with any_device_resource{ref} (wrapper had operator any_resource)
- cuda_async_memory_resource, cuda_async_managed_memory_resource,
  sam_headroom_memory_resource: copy/move changed from = delete to
  = default (CCCL resource_ref requires copyability; shared_resource
  base already provides correct reference-counted semantics)
- device_check_resource_adaptor (test): template SFINAE constructor
  to break the same CCCL #8037 cycle
…hmark

The any_resource internals use __host__ __device__ functions that call
shared_resource copy/move constructors which are __host__ only. This is
safe because the benchmark factory functions are only called from host code.
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 20, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant