-
Notifications
You must be signed in to change notification settings - Fork 0
Rework RowSelectionCursor to use enums #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework RowSelectionCursor to use enums #8
Conversation
| let mask = selection_cursor | ||
| .mask_values_for(&mask_chunk) | ||
| .ok_or_else(|| general_err!("row selection mask out of bounds"))?; | ||
| match self.read_plan.row_selection_cursor_mut() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you review this with "whitespace blind" diff it is easier to see what changed
https://github.com/hhhizzz/arrow-rs/pull/8/files?w=1
Basically the three cases are now handled via three enum variants, and the code is a match on row_selection_cursor_mut rather than Some(row_selection) and selection_cursor.is_mask_backed()
| }; | ||
|
|
||
| // Preferred strategy must not be Auto | ||
| let selection_strategy = self.preferred_selection_strategy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.preferred_strategy already returns the specified strategy when not auto, so I don't think there is any reason to repeat the logic here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find! It looks like I updated the code too many times, causing some duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thank you for bearing with us over the review process
| RowSelectionCursor::new_mask_from_selectors(selectors) | ||
| } | ||
| RowSelectionStrategy::Selectors => RowSelectionCursor::new_selectors(selectors), | ||
| RowSelectionStrategy::Auto { .. } => unreachable!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to figure out some way to encode the fact that the RowSelection will never be Auto. I am trying out some things in follow on PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about that, one approach might be to add a new enum here, like called RowSelectionBackendType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with it -- what I came up with was to split out the policy from the actual resolved strategy. PR incoming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| batch_size: usize, | ||
| /// Row ranges to be selected from the data source | ||
| selection: Option<RowSelectionCursor>, | ||
| row_selection_cursor: RowSelectionCursor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than Option... I added a RowSelectionCursor::all variant for the case that all rows are selected
| /// | ||
| /// This keeps per-reader state such as the current position and delegates the | ||
| /// actual storage strategy to the internal `RowSelectionBacking`. | ||
| /// This is best for dense selections where there are many small skips |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main change -- made two new cursor types for the different backings, which I think makes it easier to understand what is going on
The RowSelectionCursor enum simply encodes what will happen, rather than having to check None and RowSelectorCursor::is_mask_backed
| pub fn selection_mut(&mut self) -> Option<&mut RowSelectionCursor> { | ||
| self.selection.as_mut() | ||
| /// Returns a mutable reference to the selection selectors, if any | ||
| #[deprecated(since = "57.1.0", note = "Use `row_selection_cursor_mut` instead")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also implemented my suggestion here
To avoid a backwards incompatible change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the proposal of new API.
|
@alamb Thanks, that definitely makes the code clearer. I learned a lot from it! |
I was reviewing the use of
RowSelectionCursorand there were several places where it was being used as either a Mask or a Selection and couldn't be one or the other and had runtime assertsI thought it would be clearer if we encoded all the possible types as an enum directly so I tried it out and it worked well. Here is my proposal
FYI @hhhizzz