-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
Description
Describe the bug
Unable to correctly write nested structs where a struct is non-nullable.
I've noticed this behaviour before, but couldn't quite reproduce it easily.
To Reproduce
If we have the below test case (in parquet/src/arrow/arrow_writer.rs:
#[test]
fn arrow_writer_complex_mixed() {
// define schema
let offset_field = Field::new("offset", DataType::Int32, true);
let partition_field = Field::new("partition", DataType::Int64, true);
let topic_field = Field::new("topic", DataType::Utf8, true);
let schema = Schema::new(vec![
Field::new("some_nested_object", DataType::Struct(
vec![
offset_field.clone(),
partition_field.clone(),
topic_field.clone()
]
), false), // NOTE: this being false results in the array not being written correctly
]);
// create some data
let offset = Int32Array::from(vec![1, 2, 3, 4, 5]);
let partition = Int64Array::from(vec![Some(1), None, None, Some(4), Some(5)]);
let topic = StringArray::from(vec![Some("A"), None, Some("A"), Some(""), None]);
let some_nested_object = StructArray::from(vec![
(offset_field, Arc::new(offset) as ArrayRef),
(partition_field, Arc::new(partition) as ArrayRef),
(topic_field, Arc::new(topic) as ArrayRef),
]);
// build a record batch
let batch = RecordBatch::try_new(
Arc::new(schema),
vec![Arc::new(some_nested_object)],
)
.unwrap();
roundtrip("test_arrow_writer_complex_mixed.parquet", batch);
}We get a failure:
thread 'arrow::arrow_writer::tests::arrow_writer_complex_mixed' panicked at 'assertion failed: `(left == right)`
left: `1`,
right: `0`', parquet/src/util/bit_util.rs:332:9
test arrow::arrow_writer::tests::arrow_writer_complex_mixed ... FAILEDWhen the struct is nullable, the file is written correctly.
Expected behavior
The batch should be written without errors.
Additional context
From inspecting the levels that are generated for the passing and failing scenarios, they look identical (https://www.diffchecker.com/89qWByeI). It looks like the bug is with how levels of non-null structs are generated.