Skip to content

Strongly-typed reading of Parquet data #34

@alamb

Description

@alamb

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-4314

See the proposal I made on [~csun]'s repository [here|https://github.com/sunchao/parquet-rs/issues/205] for more details.

This aims to let the user opt in to strong typing and substantial performance improvements (2x-7x, see [here|https://github.com/sunchao/parquet-rs/issues/205#issuecomment-446016254]) by optionally specifying the type of the records that they are iterating over.

It is currently a work in progress. All pre-existing tests succeed, bar those in src/record/api.rs which are commented out as they require reworking. Where relevant, pre-existing tests and benchmarks have been duplicated to make new strongly-typed tests and benchmarks, which all also succeed. I've tried to maintain pre-existing APIs where possible. Some changes have been made to better align with prior art in the Rust ecosystem.

Any feedback while I continue working on it very welcome! Looking forward to hopefully seeing this merged when it's ready.

Metadata

Metadata

Assignees

No one assigned

    Labels

    parquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions