-
Notifications
You must be signed in to change notification settings - Fork 483
refactor: rename RowIdSelection to RowAddrSelection #5263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Try reopen PR to fix CI |
130a0c7 to
f10a1ce
Compare
f10a1ce to
cb9e613
Compare
| /// fragment k is selected. If there is a pair (k, Partial(v)) then the | ||
| /// fragment k has the selected rows in v. | ||
| inner: BTreeMap<u32, RowIdSelection>, | ||
| inner: BTreeMap<u32, RowAddrSelection>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this appropriate? The docs above say that it can be row ids. Do stable row ids not use this structure at all?
lance/rust/lance-core/src/utils/mask.rs
Lines 365 to 368 in cb9e613
| /// These row ids may either be stable-style (where they can be an incrementing | |
| /// u64 sequence) or address style, where they are a fragment id and a row offset. | |
| /// When address style, this supports setting entire fragments as selected, | |
| /// without needing to enumerate all the ids in the fragment. |
| /// fragment k is selected. If there is a pair (k, Partial(v)) then the | ||
| /// fragment k has the selected rows in v. | ||
| inner: BTreeMap<u32, RowIdSelection>, | ||
| inner: BTreeMap<u32, RowAddrSelection>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this one we should use the term row ID? Because this is more used to accommodate stable row ID, and for row address technically directly use RoaringTreemap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems no matter RowIdSelection nor RowAddrSelection can cover the two semantics of row ID and row address. There is a full plan in my mind(sorry, did not write it down before):
It would be better to split the responsibility of the struct for distinguishing row ID(stable) and row address.
-
for row address:
RowIdTreeMap->RowAddrMap(BTreeMap<u32, RowAddrSelection>)RowIdSelection→RowAddrSelection(Full | Partial(RoaringBitmap))SearchResult(scalar.rs) →RowAddrSearchResult(Exact(RowAddrMap) | AtMost(RowAddrMap))RowIdMask→RowAddrMask
-
for (stable) row id:
- add rowid::
RowIdSet(RoaringTreemap,64-bit set) - add rowid::
RowIdMask - add
RowIdResolver(row_id -> row_addr)
- add rowid::
-
for both:
- add
RowSetOpsforRowIdSetandRowAddrMap
- add
WDYT? cc @wjones127
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's okay. Could you put this design in an issue?
Remember that you're collaborating with other developers and they need to be able to follow what you are doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Have filed a ticket: #5326
As a step to distinguish the row ID and the row address, we should rename
RowIdSelectiontoRowAddrSelection. To show its effect more clearly.