-
Notifications
You must be signed in to change notification settings - Fork 1.1k
arrow-ord: add support for nested types to partition
#7131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @tustvold since it seems like you've worked in this code recently |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @asubiotto -- this looks great to me.
I think it just needs one more test and it would be good to merge
arrow-ord/src/partition.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if using eq would be faster 🤔
https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.eq.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean for non-nested types? eq doesn't support nested types similarly to distinct and given they both shell out to compare_op I don't think there should be much of a perf difference between distinct and eq + mapping nulls to booleans (which would be necessary).
partition
This support is currently incorrectly assumed by `BoundedWindowAggExec`, so partitioning on a nested type (e.g. struct) causes a nested comparison failure on execution. This commit adds a check to use distinct on non-nested types and falls back to using make_comparator on nested types.
07a5285 to
536082c
Compare
|
Thanks for the review! |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @asubiotto
This support is currently incorrectly assumed by
BoundedWindowAggExec, so partitioning on a nested type (e.g. struct) causes a nested comparison failure on execution.This commit adds a check to use distinct on non-nested types and falls back to using make_comparator on nested types.
Which issue does this PR close?
Rationale for this change
Please see #7130 for more in depth explanation and alternatives considered.
What changes are included in this PR?
If statement to use the old path on non-nested types and a fallback path to use
make_comparatorto check for value distinctness.Are there any user-facing changes?
Previously failing use cases are now supported.