Skip to content

Conversation

@alamb
Copy link

@alamb alamb commented Nov 10, 2025

I was reviewing the use of RowSelectionCursor and there were several places where it was being used as either a Mask or a Selection and couldn't be one or the other and had runtime asserts

I thought it would be clearer if we encoded all the possible types as an enum directly so I tried it out and it worked well. Here is my proposal

FYI @hhhizzz

let mask = selection_cursor
.mask_values_for(&mask_chunk)
.ok_or_else(|| general_err!("row selection mask out of bounds"))?;
match self.read_plan.row_selection_cursor_mut() {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you review this with "whitespace blind" diff it is easier to see what changed

https://github.com/hhhizzz/arrow-rs/pull/8/files?w=1

Basically the three cases are now handled via three enum variants, and the code is a match on row_selection_cursor_mut rather than Some(row_selection) and selection_cursor.is_mask_backed()

};

// Preferred strategy must not be Auto
let selection_strategy = self.preferred_selection_strategy();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.preferred_strategy already returns the specified strategy when not auto, so I don't think there is any reason to repeat the logic here

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find! It looks like I updated the code too many times, causing some duplication.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thank you for bearing with us over the review process

RowSelectionCursor::new_mask_from_selectors(selectors)
}
RowSelectionStrategy::Selectors => RowSelectionCursor::new_selectors(selectors),
RowSelectionStrategy::Auto { .. } => unreachable!(),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to figure out some way to encode the fact that the RowSelection will never be Auto. I am trying out some things in follow on PRs

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about that, one approach might be to add a new enum here, like called RowSelectionBackendType

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with it -- what I came up with was to split out the policy from the actual resolved strategy. PR incoming

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#9

batch_size: usize,
/// Row ranges to be selected from the data source
selection: Option<RowSelectionCursor>,
row_selection_cursor: RowSelectionCursor,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than Option... I added a RowSelectionCursor::all variant for the case that all rows are selected

///
/// This keeps per-reader state such as the current position and delegates the
/// actual storage strategy to the internal `RowSelectionBacking`.
/// This is best for dense selections where there are many small skips
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change -- made two new cursor types for the different backings, which I think makes it easier to understand what is going on

The RowSelectionCursor enum simply encodes what will happen, rather than having to check None and RowSelectorCursor::is_mask_backed

pub fn selection_mut(&mut self) -> Option<&mut RowSelectionCursor> {
self.selection.as_mut()
/// Returns a mutable reference to the selection selectors, if any
#[deprecated(since = "57.1.0", note = "Use `row_selection_cursor_mut` instead")]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also implemented my suggestion here

To avoid a backwards incompatible change

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the proposal of new API.

@hhhizzz
Copy link
Owner

hhhizzz commented Nov 10, 2025

@alamb Thanks, that definitely makes the code clearer. I learned a lot from it!

@hhhizzz hhhizzz merged commit 59ee569 into hhhizzz:rowselectionempty Nov 10, 2025
17 checks passed
@alamb
Copy link
Author

alamb commented Nov 10, 2025

@alamb Thanks, that definitely makes the code clearer. I learned a lot from it!

Thank you @hhhizzz -- I have one more coming up...

(do you sleep 💤 😆 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants