[C++][Parquet][CI] Improve Parquet fuzzing seed corpus

### Describe the enhancement requested

Currently, for our Parquet fuzzing seed corpus, we generate a grand total of 1 file here:
https://github.com/apache/arrow/blob/fb202ee66d73572f46035c5b2f21ac22f74ba951/cpp/src/parquet/arrow/generate_fuzz_corpus.cc

We should probably generate more files (and/or more batch columns) and/or enable more features:
* vary data page version
* vary compression codec
* vary encodings (e.g. delta binary, byte stream split...)
* enable page checksums (<s>and verify them on reading</s>: that's actually a bad idea as it would prevent exercising the actual decoding most of the time)
* enable statistics (and load them on reading)
* enable page indices
* enable bloom filters once https://github.com/apache/arrow/pull/37400 is merged

We should also add more datatypes, at least Boolean and FixedSizeBinary, possibly also Decimal128 and Decimal256.

### Component(s)

C++, Continuous Integration, Parquet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++][Parquet][CI] Improve Parquet fuzzing seed corpus #43709

Describe the enhancement requested

Component(s)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[C++][Parquet][CI] Improve Parquet fuzzing seed corpus #43709

Description

Describe the enhancement requested

Component(s)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions