Skip to content

[C++][Parquet] Encryption test files are generated with invalid repetition levels #45073

@adamreeve

Description

@adamreeve

Describe the bug, including details regarding any error messages, version, and platform.

As part of adding Parquet encryption to arrow-rs (apache/arrow-rs#6637), @rok and I found that arrow-rs could not read the example files in parquet-testing due to invalid repetition levels. arrow-rs complains that:

Parquet error: first repetition level of batch must be 0

This is due to the int64 list column data being written with the repetition levels flipped, 0 should indicate the start of a new list but 1 is used:

repetition_level = 1; // start of a new record

Related to this, is it also a bug that Arrow would read these files without complaining? If I test reading one of these files into Arrow format with PyArrow, the first leaf value is skipped.

Component(s)

C++, Parquet

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions