-
Notifications
You must be signed in to change notification settings - Fork 1.9k
docs: improve the documentation for Aggregate code #12617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -37,12 +37,13 @@ use std::vec; | |
|
|
||
| use datafusion_physical_expr_common::binary_map::{OutputType, INITIAL_BUFFER_CAPACITY}; | ||
|
|
||
| /// Trait for group values column-wise row comparison | ||
| /// Trait for storing a single column of group values in [`GroupValuesColumn`] | ||
| /// | ||
| /// Implementations of this trait store a in-progress collection of group values | ||
| /// (similar to various builders in Arrow-rs) that allow for quick comparison to | ||
| /// incoming rows. | ||
| /// | ||
| /// [`GroupValuesColumn`]: crate::aggregates::group_values::column_wise::GroupValuesColumn | ||
| pub trait ArrayRowEq: Send + Sync { | ||
| /// Returns equal if the row stored in this builder at `lhs_row` is equal to | ||
| /// the row in `array` at `rhs_row` | ||
|
|
@@ -60,11 +61,13 @@ pub trait ArrayRowEq: Send + Sync { | |
| fn take_n(&mut self, n: usize) -> ArrayRef; | ||
| } | ||
|
|
||
| /// An implementation of [`ArrayRowEq`] for primitive types. | ||
|
||
| pub struct PrimitiveGroupValueBuilder<T: ArrowPrimitiveType> { | ||
| group_values: Vec<T::Native>, | ||
| nulls: Vec<bool>, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should be it a BooleanArray? so this null checks will be faster and in 1 place?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are foreshadowing :) This is an excellent point. @jayzhan211 and I are working on exactly this topic. #12623 |
||
| // whether the array contains at least one null, for fast non-null path | ||
| /// whether the array contains at least one null, for fast non-null path | ||
| has_null: bool, | ||
| /// Can the input array contain nulls? | ||
| nullable: bool, | ||
| } | ||
|
|
||
|
|
@@ -154,13 +157,14 @@ impl<T: ArrowPrimitiveType> ArrayRowEq for PrimitiveGroupValueBuilder<T> { | |
| } | ||
| } | ||
|
|
||
| /// An implementation of [`ArrayRowEq`] for binary and utf8 types. | ||
|
||
| pub struct ByteGroupValueBuilder<O> | ||
| where | ||
| O: OffsetSizeTrait, | ||
| { | ||
| output_type: OutputType, | ||
| buffer: BufferBuilder<u8>, | ||
| /// Offsets into `buffer` for each distinct value. These offsets as used | ||
| /// Offsets into `buffer` for each distinct value. These offsets as used | ||
| /// directly to create the final `GenericBinaryArray`. The `i`th string is | ||
| /// stored in the range `offsets[i]..offsets[i+1]` in `buffer`. Null values | ||
| /// are stored as a zero length string. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with it and thus the question, what is group values?
is it group keys, or exact values attached for specific key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the following example, (1, 'a') is one of the group values, so is (2, 'b') and (3, 'c')
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Rows implementation, we convert (1, 'a') to row and compare against it. In Column implementation, we compare iteratively from 1 to 'a' in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great example. I added in to the docs