-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add merge and merge_n kernels
#8753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
eab6202
462cd3e
eefc171
dc7602a
8068238
fd3105c
66c8fa0
4286c72
1d947df
59a733a
10af559
ac68821
347e3df
9bb40cc
7d8a078
641eac2
f4dcb6c
2b51143
7e006a7
295a6d5
8ef3130
40e68d2
ec71911
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,366 @@ | ||||||
| // Licensed to the Apache Software Foundation (ASF) under one | ||||||
| // or more contributor license agreements. See the NOTICE file | ||||||
| // distributed with this work for additional information | ||||||
| // regarding copyright ownership. The ASF licenses this file | ||||||
| // to you under the Apache License, Version 2.0 (the | ||||||
| // "License"); you may not use this file except in compliance | ||||||
| // with the License. You may obtain a copy of the License at | ||||||
| // | ||||||
| // http://www.apache.org/licenses/LICENSE-2.0 | ||||||
| // | ||||||
| // Unless required by applicable law or agreed to in writing, | ||||||
| // software distributed under the License is distributed on an | ||||||
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||||||
| // KIND, either express or implied. See the License for the | ||||||
| // specific language governing permissions and limitations | ||||||
| // under the License. | ||||||
|
|
||||||
| //! [`merge`] and [`merge_n`]: Combine values from two or more arrays | ||||||
|
|
||||||
| use crate::filter::SlicesIterator; | ||||||
| use arrow_array::{Array, ArrayRef, BooleanArray, Datum, make_array, new_empty_array}; | ||||||
| use arrow_data::ArrayData; | ||||||
| use arrow_data::transform::MutableArrayData; | ||||||
| use arrow_schema::ArrowError; | ||||||
|
|
||||||
| /// An index for the [merge] function. | ||||||
| /// | ||||||
| /// This trait allows the indices argument for [merge] to be stored using a more | ||||||
| /// compact representation than `usize` when the input arrays are small. | ||||||
| /// If the number of input arrays is less than 256 for instance, the indices can be stored as `u8`. | ||||||
| /// | ||||||
| /// Implementation must ensure that all values which return `None` from [MergeIndex::index] are | ||||||
| /// considered equal by the [PartialEq] and [Eq] implementations. | ||||||
| pub trait MergeIndex: PartialEq + Eq + Copy { | ||||||
| /// Returns the index value as an `Option<usize>`. | ||||||
| /// | ||||||
| /// `None` values returned by this function indicate holes in the index array and will result | ||||||
| /// in null values in the array created by [merge]. | ||||||
| fn index(&self) -> Option<usize>; | ||||||
| } | ||||||
|
|
||||||
| impl MergeIndex for usize { | ||||||
| fn index(&self) -> Option<usize> { | ||||||
| Some(*self) | ||||||
| } | ||||||
| } | ||||||
|
|
||||||
| impl MergeIndex for Option<usize> { | ||||||
| fn index(&self) -> Option<usize> { | ||||||
| *self | ||||||
| } | ||||||
| } | ||||||
|
|
||||||
| /// Merges elements by index from a list of [`Array`], creating a new [`Array`] from | ||||||
| /// those values. | ||||||
| /// | ||||||
| /// Each element in `indices` is the index of an array in `values`. The `indices` array is processed | ||||||
| /// sequentially. The first occurrence of index value `n` will be mapped to the first | ||||||
| /// value of the array at index `n`. The second occurrence to the second value, and so on. | ||||||
| /// An index value where `MergeIndex::index` returns `None` is interpreted as a null value. | ||||||
| /// | ||||||
| /// # Implementation notes | ||||||
| /// | ||||||
| /// This algorithm is similar in nature to both [zip](crate::zip::zip) and | ||||||
| /// [interleave](crate::interleave::interleave), but there are some important differences. | ||||||
| /// | ||||||
| /// In contrast to [zip](crate::zip::zip), this function supports multiple input arrays. Instead of | ||||||
| /// a boolean selection vector, an index array is to take values from the input arrays, and a special | ||||||
| /// marker values can be used to indicate null values. | ||||||
| /// | ||||||
| /// In contrast to [interleave](crate::interleave::interleave), this function does not use pairs of | ||||||
| /// indices. The values in `indices` serve the same purpose as the first value in the pairs passed | ||||||
| /// to `interleave`. | ||||||
| /// The index in the array is implicit and is derived from the number of times a particular array | ||||||
| /// index occurs. | ||||||
| /// The more constrained indexing mechanism used by this algorithm makes it easier to copy values | ||||||
| /// in contiguous slices. In the example below, the two subsequent elements from array `2` can be | ||||||
| /// copied in a single operation from the source array instead of copying them one by one. | ||||||
| /// Long spans of null values are also especially cheap because they do not need to be represented | ||||||
| /// in an input array. | ||||||
| /// | ||||||
| /// # Safety | ||||||
|
||||||
| /// # Safety | |
| /// # Panics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no check for empty values array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check added along with unit tests
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let mut mutable = MutableArrayData::new(vec![&truthy, &falsy], false, truthy.len()); | |
| let mut mutable = MutableArrayData::new(vec![&truthy, &falsy], false, mask.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.