-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Support reverse for ListView #18424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support reverse for ListView #18424
Conversation
| } | ||
|
|
||
| // Take values from underlying array in the reversed order | ||
| let indices_array: ArrayRef = match size_of::<O>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this, but I wanted to 1) avoid unwraps 2) don't use 64 if 32 was sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to use O::IS_LARGE but still wondering about the approach
| )?)) | ||
| } | ||
|
|
||
| fn list_view_reverse<O: OffsetSizeTrait + TryFrom<i64>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit I did not spend much time looking at general_array_reverse, maybe I should have... It constructs a MutableArrayData and operates on it, while this uses take. There might be a good reason why general_array_reverse doesn't use take?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using take makes sense to me here. If you have time you could try both approaches and run a benchmark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I will try that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I tried the benchmark here #18425 with array_len of 10k (it's 100k on the branch), using MutableData is way worse 🫨 Baseline is the code on this PR, the code from this snippet is here.
array_reverse time: [44.858 µs 44.865 µs 44.874 µs]
change: [+545.75% +547.46% +549.17%] (p = 0.00 < 0.05)
Performance has regressed.
This indicates it might be worth using take instead of MutableData on the regular array one too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to go with take in that case
This comment was marked as outdated.
This comment was marked as outdated.
b93da9c to
1c0343d
Compare
1c0343d to
0175167
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, have some suggestions for refactoring.
I think its unavoidable to reshuffle/materialize the underlying child array given my understanding of ListView arrays.
- Reference: #18350 (comment)
| // Materialize values from underlying array with take | ||
| let indices_array: ArrayRef = if O::IS_LARGE { | ||
| Arc::new(arrow::array::UInt64Array::from( | ||
| indices | ||
| .iter() | ||
| .map(|i| i.as_usize() as u64) | ||
| .collect::<Vec<_>>(), | ||
| )) | ||
| } else { | ||
| Arc::new(UInt32Array::from( | ||
| indices | ||
| .iter() | ||
| .map(|i| i.as_usize() as u32) | ||
| .collect::<Vec<_>>(), | ||
| )) | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I do wonder if there is a better way to do this, will keep thinking 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this feels cumbersome!
| )?)) | ||
| } | ||
|
|
||
| fn list_view_reverse<O: OffsetSizeTrait + TryFrom<i64>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using take makes sense to me here. If you have time you could try both approaches and run a benchmark?
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> N/A ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> When reviewing #18424 I noticed some refactoring that could be applied to existing array reverse implementation. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> See my comments for the refactors & justifications. Existing tests. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> No. <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
| // even if the original array had elements out of order. | ||
| let mut indices: Vec<O> = Vec::with_capacity(values.len()); | ||
| let mut new_sizes = Vec::with_capacity(sizes.len()); | ||
| let mut new_offsets: Vec<O> = Vec::with_capacity(offsets.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW you can convert the existing indices to a Vec for in place update as well -- I tried to document that a bit more in apache/arrow-rs#8771
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
I think @alamb might have confused what this reverse function is doing
| )?)) | ||
| } | ||
|
|
||
| fn list_view_reverse<O: OffsetSizeTrait + TryFrom<i64>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to go with take in that case
| } | ||
|
|
||
| #[test] | ||
| fn test_reverse_list_view_empty() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also have a test case where its all nulls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Done in 7d39f69
# Which issue does this PR close? - related to apache/datafusion#18424 # Rationale for this change It may not be obvious how to convert certain Arrow arrays to/from Vec without copying for manipulation, so let's add an example # What changes are included in this PR? 1. Add note about zero copy arrays 2. Add examples of modifying a primitive array using zero-copy conversion to/from Vec # Are these changes tested? By CI # Are there any user-facing changes? Docs only, no functional change --------- Co-authored-by: Vegard Stikbakke <[email protected]>
|
Thanks @vegarsti |
## Which issue does this PR close? - Closes apache#18350. ## Rationale for this change We want to be able to reverse a ListView. ## What changes are included in this PR? - Downcast `&dyn Array` to `ListView`: `as_list_view_array` - Downcast `&dyn Array` to `LargeListView`: `as_large_list_view_array` - Branches in `array_reverse_inner` to reverse `ListView` and `LargeListView` - Main logic in `list_view_reverse` which materializes a new values array using `take` ## Are these changes tested? Yes
There's no benchmarks for `array_reverse`. I used this while working on #18424 to confirm `take` was faster than MutableData for ListView. That might be the case for other List types as well, which are currently using `MutableData`. The benchmark can be run with `cargo bench --bench array_reverse`.
Which issue does this PR close?
reversefor ListView #18350.Rationale for this change
We want to be able to reverse a ListView.
What changes are included in this PR?
&dyn ArraytoListView:as_list_view_array&dyn ArraytoLargeListView:as_large_list_view_arrayarray_reverse_innerto reverseListViewandLargeListViewlist_view_reversewhich materializes a new values array usingtakeAre these changes tested?
Yes