-
Notifications
You must be signed in to change notification settings - Fork 245
Description
Fix Symbol Visibility for Header-Only RMM Classes
Problem
When cuDF 26.02 and nvcomp 5.1 are used together, certain tests segfault due to an ODR (One Definition Rule) violation (COMPRESSION_TEST, ORC_TEST, PARQUET_TEST, HYBRID_SCAN_TEST, LARGE_STRINGS_TEST). When multiple libraries compile RMM's header-only code with different versions, those symbols can conflict at runtime.
The root cause of the problem is that RMM's header-only and template classes use namespace RMM_NAMESPACE, which expands to __attribute__((visibility("default"))) rmm. This forces symbols to be publicly exported even when downstream libraries compile with -fvisibility=hidden.
When both libraries are loaded:
- Both libraries export vtables for
rmm::mr::pool_memory_resource<pinned_host_memory_resource> - The dynamic linker merges these weak symbols
- ABI incompatibility between versions causes segfaults
Background
This plan builds on prior work establishing RMM's visibility model:
- Issue [FEA] limit symbol visibility in DSOs in the Python package #1645: Feature request to limit symbol visibility in DSOs to reduce conflicts and library size
- PR Hide visibility of non-public symbols #1644: Implemented hidden visibility for non-public symbols via
-fvisibility=hiddenon Cython targets - Issue [BUG] Cannot catch RMM exceptions thrown across DSO boundaries #1652: Identified that exception types need default visibility to be caught across DSO boundaries
- PR Explicitly mark RMM headers with
RMM_EXPORT#1654: AddedRMM_NAMESPACE/RMM_EXPORTto ensure proper cross-DSO exception handling - GCC Wiki on Visibility: Documents that typeinfo for exceptions and classes used with
dynamic_castmust have consistent default visibility across all DSOs
The key insight from the GCC wiki: if even ONE compilation unit defines a type with hidden visibility, the linker marks its typeinfo as hidden permanently, causing silent failures where exceptions cannot be caught across DSO boundaries.
Solution
Change template and header-only classes that are instantiated in downstream code from namespace RMM_NAMESPACE to plain namespace rmm. This allows symbols to inherit the default visibility from the downstream library's compile flags, preventing unwanted symbol exports.
However, certain classes MUST retain RMM_NAMESPACE for correct cross-DSO behavior:
- Exception types (for cross-DSO catching)
- Base classes with virtual functions used polymorphically across DSOs
Verified Fix
Changing just two files fixes all failing tests:
rmm/mr/pool_memory_resource.hpprmm/mr/detail/stream_ordered_memory_resource.hpp
After this change, the vtables for pool_memory_resource and stream_ordered_memory_resource are hidden from libcudf.so's dynamic symbol table, eliminating the symbol collision.
Implementation Plan
Files that MUST use RMM_NAMESPACE
1. Files with symbols compiled into librmm.so
| File | Reason |
|---|---|
rmm/aligned.hpp |
Has aligned.cpp |
rmm/cuda_device.hpp |
Has cuda_device.cpp |
rmm/cuda_stream.hpp |
Has cuda_stream.cpp |
rmm/cuda_stream_pool.hpp |
Has cuda_stream_pool.cpp |
rmm/cuda_stream_view.hpp |
Has cuda_stream_view.cpp |
rmm/device_buffer.hpp |
Has device_buffer.cpp |
rmm/exec_policy.hpp |
Has exec_policy.cpp |
rmm/logger.hpp |
Has logger.cpp |
rmm/prefetch.hpp |
Has prefetch.cpp |
2. Exception types (required for cross-DSO catching)
| File | Reason |
|---|---|
rmm/error.hpp |
Defines rmm::bad_alloc, rmm::out_of_memory, rmm::logic_error - must be catchable across DSO boundaries |
rmm/detail/error.hpp |
Defines rmm::cuda_error - must be catchable across DSO boundaries |
3. Base classes with vtables used polymorphically across DSOs
| File | Reason |
|---|---|
rmm/mr/device_memory_resource.hpp |
Base class - derived types in downstream DSOs must share vtable/typeinfo |
rmm/mr/host_memory_resource.hpp |
Base class - derived types in downstream DSOs must share vtable/typeinfo |
Files to change from RMM_NAMESPACE to namespace rmm
These are header-only/template classes instantiated in downstream libraries. Using plain namespace rmm allows downstream libraries to control their visibility.
Template memory resources (16 files):
rmm/mr/pool_memory_resource.hpprmm/mr/arena_memory_resource.hpprmm/mr/aligned_resource_adaptor.hpprmm/mr/binning_memory_resource.hpprmm/mr/failure_callback_resource_adaptor.hpprmm/mr/fixed_size_memory_resource.hpprmm/mr/limiting_resource_adaptor.hpprmm/mr/logging_resource_adaptor.hpprmm/mr/owning_wrapper.hpprmm/mr/polymorphic_allocator.hpprmm/mr/prefetch_resource_adaptor.hpprmm/mr/statistics_resource_adaptor.hpprmm/mr/thread_safe_resource_adaptor.hpprmm/mr/thrust_allocator_adaptor.hpprmm/mr/tracking_resource_adaptor.hpprmm/mr/is_resource_adaptor.hpp
Header-only memory resource implementations (8 files):
rmm/mr/cuda_memory_resource.hpprmm/mr/managed_memory_resource.hpprmm/mr/pinned_host_memory_resource.hpprmm/mr/cuda_async_memory_resource.hpprmm/mr/cuda_async_view_memory_resource.hpprmm/mr/cuda_async_managed_memory_resource.hpprmm/mr/callback_memory_resource.hpprmm/mr/sam_headroom_memory_resource.hpprmm/mr/system_memory_resource.hpp
Template device containers (4 files):
rmm/device_uvector.hpprmm/device_scalar.hpprmm/device_vector.hpprmm/resource_ref.hpp
Detail headers:
rmm/mr/detail/stream_ordered_memory_resource.hpprmm/mr/detail/arena.hpprmm/mr/detail/coalescing_free_list.hpprmm/mr/detail/fixed_size_free_list.hpprmm/mr/detail/free_list.hpprmm/mr/detail/device_memory_resource_view.hpprmm/detail/aligned.hpprmm/detail/cccl_adaptors.hpprmm/detail/cuda_memory_resource.hpprmm/detail/format.hpprmm/detail/runtime_capabilities.hpprmm/detail/stack_trace.hpp
Other header-only files:
rmm/mr/per_device_resource.hpp
Guiding Principle
Use RMM_NAMESPACE for symbols that meet ANY of these criteria:
- Compiled into librmm.so (have a corresponding .cpp file)
- Exception types that may be thrown and caught across DSO boundaries
- Base classes with virtual functions used polymorphically across DSO boundaries
Use plain namespace rmm for:
- All template classes (instantiated in downstream libraries)
- Header-only derived classes (compiled into downstream libraries)
- All detail/internal headers
This ensures:
- Downstream libraries can control symbol visibility via their own compile flags
- ODR violations are prevented when multiple libraries use different RMM versions
- Exceptions can still be caught across DSO boundaries
- Polymorphic base classes work correctly across DSOs
Metadata
Metadata
Assignees
Labels
Type
Projects
Status