Skip to content

Fix symbol visibility for header-only RMM classes to prevent ODR violations #2219

@bdice

Description

@bdice

Fix Symbol Visibility for Header-Only RMM Classes

Problem

When cuDF 26.02 and nvcomp 5.1 are used together, certain tests segfault due to an ODR (One Definition Rule) violation (COMPRESSION_TEST, ORC_TEST, PARQUET_TEST, HYBRID_SCAN_TEST, LARGE_STRINGS_TEST). When multiple libraries compile RMM's header-only code with different versions, those symbols can conflict at runtime.

The root cause of the problem is that RMM's header-only and template classes use namespace RMM_NAMESPACE, which expands to __attribute__((visibility("default"))) rmm. This forces symbols to be publicly exported even when downstream libraries compile with -fvisibility=hidden.

When both libraries are loaded:

  1. Both libraries export vtables for rmm::mr::pool_memory_resource<pinned_host_memory_resource>
  2. The dynamic linker merges these weak symbols
  3. ABI incompatibility between versions causes segfaults

Background

This plan builds on prior work establishing RMM's visibility model:

The key insight from the GCC wiki: if even ONE compilation unit defines a type with hidden visibility, the linker marks its typeinfo as hidden permanently, causing silent failures where exceptions cannot be caught across DSO boundaries.

Solution

Change template and header-only classes that are instantiated in downstream code from namespace RMM_NAMESPACE to plain namespace rmm. This allows symbols to inherit the default visibility from the downstream library's compile flags, preventing unwanted symbol exports.

However, certain classes MUST retain RMM_NAMESPACE for correct cross-DSO behavior:

  • Exception types (for cross-DSO catching)
  • Base classes with virtual functions used polymorphically across DSOs

Verified Fix

Changing just two files fixes all failing tests:

  • rmm/mr/pool_memory_resource.hpp
  • rmm/mr/detail/stream_ordered_memory_resource.hpp

After this change, the vtables for pool_memory_resource and stream_ordered_memory_resource are hidden from libcudf.so's dynamic symbol table, eliminating the symbol collision.

Implementation Plan

Files that MUST use RMM_NAMESPACE

1. Files with symbols compiled into librmm.so

File Reason
rmm/aligned.hpp Has aligned.cpp
rmm/cuda_device.hpp Has cuda_device.cpp
rmm/cuda_stream.hpp Has cuda_stream.cpp
rmm/cuda_stream_pool.hpp Has cuda_stream_pool.cpp
rmm/cuda_stream_view.hpp Has cuda_stream_view.cpp
rmm/device_buffer.hpp Has device_buffer.cpp
rmm/exec_policy.hpp Has exec_policy.cpp
rmm/logger.hpp Has logger.cpp
rmm/prefetch.hpp Has prefetch.cpp

2. Exception types (required for cross-DSO catching)

File Reason
rmm/error.hpp Defines rmm::bad_alloc, rmm::out_of_memory, rmm::logic_error - must be catchable across DSO boundaries
rmm/detail/error.hpp Defines rmm::cuda_error - must be catchable across DSO boundaries

3. Base classes with vtables used polymorphically across DSOs

File Reason
rmm/mr/device_memory_resource.hpp Base class - derived types in downstream DSOs must share vtable/typeinfo
rmm/mr/host_memory_resource.hpp Base class - derived types in downstream DSOs must share vtable/typeinfo

Files to change from RMM_NAMESPACE to namespace rmm

These are header-only/template classes instantiated in downstream libraries. Using plain namespace rmm allows downstream libraries to control their visibility.

Template memory resources (16 files):

  • rmm/mr/pool_memory_resource.hpp
  • rmm/mr/arena_memory_resource.hpp
  • rmm/mr/aligned_resource_adaptor.hpp
  • rmm/mr/binning_memory_resource.hpp
  • rmm/mr/failure_callback_resource_adaptor.hpp
  • rmm/mr/fixed_size_memory_resource.hpp
  • rmm/mr/limiting_resource_adaptor.hpp
  • rmm/mr/logging_resource_adaptor.hpp
  • rmm/mr/owning_wrapper.hpp
  • rmm/mr/polymorphic_allocator.hpp
  • rmm/mr/prefetch_resource_adaptor.hpp
  • rmm/mr/statistics_resource_adaptor.hpp
  • rmm/mr/thread_safe_resource_adaptor.hpp
  • rmm/mr/thrust_allocator_adaptor.hpp
  • rmm/mr/tracking_resource_adaptor.hpp
  • rmm/mr/is_resource_adaptor.hpp

Header-only memory resource implementations (8 files):

  • rmm/mr/cuda_memory_resource.hpp
  • rmm/mr/managed_memory_resource.hpp
  • rmm/mr/pinned_host_memory_resource.hpp
  • rmm/mr/cuda_async_memory_resource.hpp
  • rmm/mr/cuda_async_view_memory_resource.hpp
  • rmm/mr/cuda_async_managed_memory_resource.hpp
  • rmm/mr/callback_memory_resource.hpp
  • rmm/mr/sam_headroom_memory_resource.hpp
  • rmm/mr/system_memory_resource.hpp

Template device containers (4 files):

  • rmm/device_uvector.hpp
  • rmm/device_scalar.hpp
  • rmm/device_vector.hpp
  • rmm/resource_ref.hpp

Detail headers:

  • rmm/mr/detail/stream_ordered_memory_resource.hpp
  • rmm/mr/detail/arena.hpp
  • rmm/mr/detail/coalescing_free_list.hpp
  • rmm/mr/detail/fixed_size_free_list.hpp
  • rmm/mr/detail/free_list.hpp
  • rmm/mr/detail/device_memory_resource_view.hpp
  • rmm/detail/aligned.hpp
  • rmm/detail/cccl_adaptors.hpp
  • rmm/detail/cuda_memory_resource.hpp
  • rmm/detail/format.hpp
  • rmm/detail/runtime_capabilities.hpp
  • rmm/detail/stack_trace.hpp

Other header-only files:

  • rmm/mr/per_device_resource.hpp

Guiding Principle

Use RMM_NAMESPACE for symbols that meet ANY of these criteria:

  1. Compiled into librmm.so (have a corresponding .cpp file)
  2. Exception types that may be thrown and caught across DSO boundaries
  3. Base classes with virtual functions used polymorphically across DSO boundaries

Use plain namespace rmm for:

  1. All template classes (instantiated in downstream libraries)
  2. Header-only derived classes (compiled into downstream libraries)
  3. All detail/internal headers

This ensures:

  • Downstream libraries can control symbol visibility via their own compile flags
  • ODR violations are prevented when multiple libraries use different RMM versions
  • Exceptions can still be caught across DSO boundaries
  • Polymorphic base classes work correctly across DSOs

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions