-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
Current array_slice implementation assumes the from/to/stride arguments come from arrays which may vary per row; however I'd assume most cases we actually have scalar values for those arguments in which case we can take a fast path.
Describe the solution you'd like
For example in here:
datafusion/datafusion/functions-nested/src/extract.rs
Lines 456 to 470 in 9238779
| fn general_array_slice<O: OffsetSizeTrait>( | |
| array: &GenericListArray<O>, | |
| from_array: &Int64Array, | |
| to_array: &Int64Array, | |
| stride: Option<&Int64Array>, | |
| ) -> Result<ArrayRef> | |
| where | |
| i64: TryInto<O>, | |
| { | |
| let values = array.values(); | |
| let original_data = values.to_data(); | |
| let capacity = Capacities::Array(original_data.len()); | |
| let mut mutable = | |
| MutableArrayData::with_capacities(vec![&original_data], true, capacity); |
If we know from the scalar from/to/stride that the slice is either empty or contiguous, we can avoid recreating the child array as no shuffling is needed; only in cases where the stride is > 1 or if its a reverse slice would we need to actually shuffle child array data.
Describe alternatives you've considered
No response
Additional context
Probably makes more sense to do this after the refactoring done by #18432
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request