Skip to content

Remove exposure on column construction and unwrap buffers on pylibcudf conversion#20980

Merged
rapids-bot[bot] merged 10 commits intorapidsai:mainfrom
vyasr:feat/pylibcudf_unwrapping
Jan 8, 2026
Merged

Remove exposure on column construction and unwrap buffers on pylibcudf conversion#20980
rapids-bot[bot] merged 10 commits intorapidsai:mainfrom
vyasr:feat/pylibcudf_unwrapping

Conversation

@vyasr
Copy link
Copy Markdown
Contributor

@vyasr vyasr commented Jan 8, 2026

Description

This PR removes the ability to construct columns and buffers that are already exposed, which is not actually ever possible in the current cudf data model. This change allows us to simplify the various column constructors and standardize the validation process.

Relatedly, this PR ensures that conversion from cudf ColumnBase to pylibcudf Column unwraps Buffers so that you do not expose the pylibcudf representation to cudf's Buffer semantics. That change should allow us to fully decouple the internal representation of pylibcudf Columns inside cudf from how they are exposed to public APIs, which also ensures that we do not break CoW and spilling functionality by making too many Buffer copies that we shouldn't.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

vyasr and others added 9 commits January 6, 2026 20:52
Remove the `copy` parameter from DataFrame.to_pylibcudf(),
Series.to_pylibcudf(), and Index.to_pylibcudf() since it was never
implemented and always raised NotImplementedError.

Updated docstrings to clarify that these methods always perform zero-copy
operations and return views of the existing data.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add a deep copy method for pylibcudf.Table that creates a new table with
deep copies of all columns. The implementation follows the same pattern as
Column.copy(), accepting optional stream and memory resource parameters.

Changes:
- Added Table.copy() method in table.pyx
- Added method declaration in table.pxd
- Added type stub in table.pyi
- Added test_table_copy() to verify deep copy behavior

The copy method iterates over all columns and calls copy() on each,
ensuring complete independence between the original and copied tables.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
With the buffer unwrapping changes in to_pylibcudf(), buffers are no longer
truly "exposed" during construction. The pylibcudf source objects maintain
references to the original buffer owners, ensuring data stays alive even if
cudf's wrapper is deleted during a spill.

Changes:
- Remove exposed parameter from BufferOwner.__init__() and from_device_memory()
- Remove exposed parameter from SpillableBufferOwner.from_device_memory()
- Remove exposed parameter from as_buffer() utility function
- Remove exposed parameter from all Column type constructors
- Remove data_ptr_exposed parameter from ColumnBase.from_pylibcudf()
- Remove data_ptr_exposed parameter from ColumnBase.from_cuda_array_interface()
- Remove data_ptr_exposed=True from high-level API call sites

Buffers are now only marked as exposed through implicit detection when:
- Raw ptr property is accessed outside an access context
- __cuda_array_interface__["data"][0] is accessed
- scope="external" is used in an access context

This ensures correct spillable buffer behavior and copy-on-write semantics.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…_args

Replaced individual __init__ overrides in column subclasses with a unified
_validate_args classmethod pattern. This eliminates code duplication and
centralizes validation logic.

Changes:
- Added ColumnBase._validate_args() classmethod for plc_column/dtype validation
- Removed __init__ overrides from NumericalColumn, DatetimeColumn, TimeDeltaColumn,
  DecimalBaseColumn, ListColumn, StructColumn, and IntervalColumn
- StringColumn retains minimal __init__ for instance attribute initialization
- Removed deprecated _validate_dtype_instance methods
- Each column type now overrides _validate_args() for type-specific validation

Benefits:
- Reduces code duplication across 8+ column types
- Centralizes validation logic in one place
- Maintains all existing validation behavior
- Easier to maintain and extend

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replaced manual caching of _start_offset and _end_offset with
@functools.cached_property, eliminating the need for a custom __init__
method in StringColumn.

Changes:
- Converted start_offset and end_offset from manual @Property with
  conditional caching to @cached_property
- Removed __init__ override that initialized _start_offset and _end_offset
- Removed class-level type annotations for _start_offset and _end_offset
- Simplified offset computation logic without manual cache checks

Benefits:
- Eliminates the last __init__ override in column subclasses
- Reduces boilerplate code for caching
- More Pythonic use of standard library features
- Automatic caching behavior without manual state management

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Wrap transpose() with access_columns() to prevent buffers from being
  incorrectly marked as exposed during pylibcudf calls
- Update __cuda_array_interface__ to use scope="external" to properly
  mark buffers as exposed when external code accesses the pointer
- Update test_df_transpose to expect buffers NOT to be exposed after
  normal DataFrame operations (only when explicitly accessed)
- Update test_series_zero_copy_cow_on expectations: shallow copies
  share memory with source, so both change when external array changes
- Remove exposed parameter from as_buffer() call in test_get_ptr

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@vyasr vyasr self-assigned this Jan 8, 2026
@vyasr vyasr added the improvement Improvement / enhancement to an existing function label Jan 8, 2026
@vyasr vyasr requested a review from a team as a code owner January 8, 2026 00:02
@vyasr vyasr added the non-breaking Non-breaking change label Jan 8, 2026
@vyasr vyasr requested review from bdice and galipremsagar January 8, 2026 00:02
@github-actions github-actions Bot added Python Affects Python cuDF API. pylibcudf Issues specific to the pylibcudf package labels Jan 8, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python Jan 8, 2026
@mroeschke
Copy link
Copy Markdown
Contributor

/merge

@rapids-bot rapids-bot Bot merged commit 059ee7f into rapidsai:main Jan 8, 2026
529 of 536 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in cuDF Python Jan 8, 2026
@vyasr vyasr deleted the feat/pylibcudf_unwrapping branch January 8, 2026 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants