Skip to content

Support arbitrary column names #957

@liwensun

Description

@liwensun

Overview

This is the issue to track interest, feature requests, and progress being made on support for arbitrary column names in Delta Lake, which is a part of implementing “column renaming and dropping” as outlined on the Delta OSS 2022 H1 roadmap here.

Delta tables use Parquet as the underlying file format. Right now, a Delta column must be stored in the underlying Parquet files using the same name. Thus, users can't name a Delta column using characters disallowed by Parquet. This limitation could cause inconvenience for Delta users who want to directly ingest data that contains columns with special characters, e.g., columns with spaces are common in CSV. The end goal of this issue is to lift the Delta column naming restrictions inherited from Parquet.

Requirements

Users can name Delta columns using characters disallowed by Parquet, without concerning what column names the underlying Parquet files use. When a Delta column contains such characters, all existing Delta operations and API behaviors should not be impacted.

Please see the detailed requirements here.

Design Sketch

Please see the detailed design sketch here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions