-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix equal_to in PrimitiveGroupValueBuilder
#12758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -91,15 +91,28 @@ impl<T: ArrowPrimitiveType, const NULLABLE: bool> GroupColumn | |||||
| for PrimitiveGroupValueBuilder<T, NULLABLE> | ||||||
| { | ||||||
| fn equal_to(&self, lhs_row: usize, array: &ArrayRef, rhs_row: usize) -> bool { | ||||||
| // Perf: skip null check (by short circuit) if input is not ullable | ||||||
| let null_match = if NULLABLE { | ||||||
| self.nulls.is_null(lhs_row) == array.is_null(rhs_row) | ||||||
| } else { | ||||||
| true | ||||||
| }; | ||||||
| // Perf: skip null check (by short circuit) if input is not nullable | ||||||
| if NULLABLE { | ||||||
| // In nullable path, we should check if both `exist row` and `input row` | ||||||
| // are null/not null | ||||||
| let is_exist_null = self.nulls.is_null(lhs_row); | ||||||
| let null_match = self.nulls.is_null(lhs_row) == array.is_null(rhs_row); | ||||||
|
||||||
| let null_match = self.nulls.is_null(lhs_row) == array.is_null(rhs_row); | |
| let null_match = is_exist_null == array.is_null(rhs_row); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is some way to write a reproducer in an end to end test (as in .slt as well) 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I am trying it, but I still have no idea.
I found it through the fuzz tests in #12667 . And to be honest, I am still confused about why the null row will have the non-default value...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it !
It may be due to the compute::take function using to generate the random dataset.
https://github.com/Rachelint/arrow-datafusion/blob/9ad971be0c6c808e77a74cdfc571a33732a0838a/test-utils/src/array_gen/primitive.rs#L48-L62
Let's see take's implementation:
// In `take_primitive`:
let values_buf = take_native(values.values(), indices);
let nulls = take_nulls(values.nulls(), indices);
// In `take_native`:
match indices.nulls().filter(|n| n.null_count() > 0) {
Some(n) => indices
.values()
.iter()
.enumerate()
.map(|(idx, index)| match values.get(index.as_usize()) {
Some(v) => *v,
None => match n.is_null(idx) {
true => T::default(),
false => panic!("Out-of-bounds index {index:?}"),
},
})
.collect(),
It will still try to take value from values ranther than using default value, even the row in indicies is null...
This logic seems unreasonable actually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a bit hard to produce it in end to end test.
The null row with a non-default value is only possible to exist in some special cases,
like generating through take as mentioned above, or as I remember we use it to improve filter performance in avg accumulator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this last night and I think the byte buffer below has the same problem. I will make a follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it does -- #12770 to fix