Skip to content

Quality of Life improvements for Blob writes #18635

@voonhous

Description

@voonhous

Task Description

What needs to be done:

Let users write a BLOB column without spelling out the unused sibling field.
Today every row needs the full 3-field {type, data, reference} struct, even
when the row carries only inline bytes (or only an external reference).

  • INLINE writes should accept {type, data}.
  • OUT_OF_LINE writes should accept {type, reference}.
  • The missing field gets padded to null at the writer entry; canonical
    3-field input is still a no-op.
  • Padding should also work when the BLOB is nested inside a struct, an array,
    or a map.

Why this task is needed:

The current named_struct(... 'reference', cast(null as struct<external_path:string,offset:bigint,length:bigint,managed:boolean>)) boilerplate
(or its DataFrame equivalent) is noisy and easy to get wrong, especially for
INLINE blobs where the reference field has no meaning. It's the first thing
people hit when they try to write a blob, so it shapes the ergonomic
impression of the feature.

Task Type

Code improvement/refactoring

Related Issues

Parent feature issue: (if applicable )
Related issues:
NOTE: Use Relationships button to add parent/blocking issues after issue is created.

Metadata

Metadata

Assignees

Labels

type:devtaskDevelopment tasks and maintenance work

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions