Describe the bug, including details regarding any error messages, version, and platform.
As part of adding Parquet encryption to arrow-rs (apache/arrow-rs#6637), @rok and I found that arrow-rs could not read the example files in parquet-testing due to invalid repetition levels. arrow-rs complains that:
Parquet error: first repetition level of batch must be 0
This is due to the int64 list column data being written with the repetition levels flipped, 0 should indicate the start of a new list but 1 is used:
|
repetition_level = 1; // start of a new record |
Related to this, is it also a bug that Arrow would read these files without complaining? If I test reading one of these files into Arrow format with PyArrow, the first leaf value is skipped.
Component(s)
C++, Parquet