Skip to content

Add Builder interface for adding Arrays to record batches #210

@alamb

Description

@alamb

Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-12411

Use case:

While writing tests (both in IOx and in DataFusion) where I need a single RecordBatch, I often find myself doing something like this:

        let schema = Arc::new(Schema::new(vec![
            ArrowField::new("float_field", ArrowDataType::Float64, true),
            ArrowField::new("time", ArrowDataType::Int64, true),
        ]));

        let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1]));
        let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000]));

        let batch = RecordBatch::try_new(schema, vec![float_array, timestamp_array])
            .expect("created new record batch");

This is annoying because the information that float_field is a float is encoded both in the Schema and the Float64Array

I would much rather rather be able to construct RecordBatches a a builder style to avoid the the redundancy and reduce the amount of typing / redundancy:


        let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1]));
        let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000]));

        let batch = RecordBatch::empty()
          .append("float_field", timestamp_array).unwrap()
          .append("time", float_array).unwrap;

The proposal is to add a method to RecordBatch like

impl RecordBatch {
...
  fn append(self, field_name: &str, field_values: ArrayRef) -> Result<Self>
}

That would append the a field name to the current schema, returning an error if field_name was already present.

The nullability of the field would be set based on the actual null count of the field_values

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions