-
Notifications
You must be signed in to change notification settings - Fork 0
Rework RowSelectionCursor to use enums #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,6 +26,7 @@ use crate::arrow::arrow_reader::{ | |
| use crate::errors::{ParquetError, Result}; | ||
| use arrow_array::Array; | ||
| use arrow_select::filter::prep_null_mask_filter; | ||
| use std::collections::VecDeque; | ||
|
|
||
| /// A builder for [`ReadPlan`] | ||
| #[derive(Clone, Debug)] | ||
|
|
@@ -89,6 +90,8 @@ impl ReadPlanBuilder { | |
| } | ||
|
|
||
| /// Returns the preferred [`RowSelectionStrategy`] for materialising the current selection. | ||
| /// | ||
| /// Guarantees to return either `Selectors` or `Mask`, never `Auto`. | ||
| pub fn preferred_selection_strategy(&self) -> RowSelectionStrategy { | ||
| match self.selection_strategy { | ||
| RowSelectionStrategy::Selectors => RowSelectionStrategy::Selectors, | ||
|
|
@@ -167,25 +170,35 @@ impl ReadPlanBuilder { | |
| if !self.selects_any() { | ||
| self.selection = Some(RowSelection::from(vec![])); | ||
| } | ||
| let selection_strategy = match self.selection_strategy { | ||
| RowSelectionStrategy::Auto { .. } => self.preferred_selection_strategy(), | ||
| strategy => strategy, | ||
| }; | ||
|
|
||
| // Preferred strategy must not be Auto | ||
| let selection_strategy = self.preferred_selection_strategy(); | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. self.preferred_strategy already returns the specified strategy when not auto, so I don't think there is any reason to repeat the logic here
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good find! It looks like I updated the code too many times, causing some duplication.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, thank you for bearing with us over the review process |
||
|
|
||
| let Self { | ||
| batch_size, | ||
| selection, | ||
| selection_strategy: _, | ||
| } = self; | ||
|
|
||
| let selection = selection.map(|s| { | ||
| let trimmed = s.trim(); | ||
| let selectors: Vec<RowSelector> = trimmed.into(); | ||
| RowSelectionCursor::new(selectors, selection_strategy) | ||
| }); | ||
| let selection = selection.map(|s| s.trim()); | ||
|
|
||
| let row_selection_cursor = selection | ||
| .map(|s| { | ||
| let trimmed = s.trim(); | ||
| let selectors: Vec<RowSelector> = trimmed.into(); | ||
| match selection_strategy { | ||
| RowSelectionStrategy::Mask => { | ||
| RowSelectionCursor::new_mask_from_selectors(selectors) | ||
| } | ||
| RowSelectionStrategy::Selectors => RowSelectionCursor::new_selectors(selectors), | ||
| RowSelectionStrategy::Auto { .. } => unreachable!(), | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am trying to figure out some way to encode the fact that the RowSelection will never be Auto. I am trying out some things in follow on PRs
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've thought about that, one approach might be to add a new enum here, like called
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I played around with it -- what I came up with was to split out the policy from the actual resolved strategy. PR incoming
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| } | ||
| }) | ||
| .unwrap_or(RowSelectionCursor::new_all()); | ||
|
|
||
| ReadPlan { | ||
| batch_size, | ||
| selection, | ||
| row_selection_cursor, | ||
| } | ||
| } | ||
| } | ||
|
|
@@ -283,13 +296,23 @@ pub struct ReadPlan { | |
| /// The number of rows to read in each batch | ||
| batch_size: usize, | ||
| /// Row ranges to be selected from the data source | ||
| selection: Option<RowSelectionCursor>, | ||
| row_selection_cursor: RowSelectionCursor, | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rather than Option... I added a |
||
| } | ||
|
|
||
| impl ReadPlan { | ||
| /// Returns a mutable reference to the selection, if any | ||
| pub fn selection_mut(&mut self) -> Option<&mut RowSelectionCursor> { | ||
| self.selection.as_mut() | ||
| /// Returns a mutable reference to the selection selectors, if any | ||
| #[deprecated(since = "57.1.0", note = "Use `row_selection_cursor_mut` instead")] | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also implemented my suggestion here To avoid a backwards incompatible change
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for the proposal of new API. |
||
| pub fn selection_mut(&mut self) -> Option<&mut VecDeque<RowSelector>> { | ||
| if let RowSelectionCursor::Selectors(selectors_cursor) = &mut self.row_selection_cursor { | ||
| Some(selectors_cursor.selectors_mut()) | ||
| } else { | ||
| None | ||
| } | ||
| } | ||
|
|
||
| /// Returns a mutable reference to the row selection cursor | ||
| pub fn row_selection_cursor_mut(&mut self) -> &mut RowSelectionCursor { | ||
| &mut self.row_selection_cursor | ||
| } | ||
|
|
||
| /// Return the number of rows to read in each output batch | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you review this with "whitespace blind" diff it is easier to see what changed
https://github.com/hhhizzz/arrow-rs/pull/8/files?w=1
Basically the three cases are now handled via three enum variants, and the code is a match on
row_selection_cursor_mutrather thanSome(row_selection)andselection_cursor.is_mask_backed()