-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add support for FixedSizeList type in arrow_cast, hashing
#8344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
stall until the next arrow-rs (50.0) |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this PR is waiting for the next arrow release (arrow 50.0.0?)
Because it just updated the support cast from List/LargeList to FixedList @alamb |
|
I see -- so while this code doesn't directly depend on the arrow release, it won't be very helpful until that support released. Makes sense to me. Thank you @Weijun-H |
arrow_cast
37638c8 to
b8b12a1
Compare
|
❤️ |
arrow_castarrow_cast, hashing
b5ccbe2 to
e6c9b34
Compare
e6c9b34 to
78e7d2f
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Weijun-H -- as always, your contributions are most appreciated
| ---- | ||
| NULL | ||
|
|
||
| #TODO: arrow-rs doesn't support it yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be casting [1] (not 1)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the List supports casting from UTF8 to List with a single size. Therefore, I think FixedSizeList should also support it.
select arrow_cast('1', 'LargeList(Int64)');
----
[1]
| Ok(()) | ||
| } | ||
|
|
||
| fn hash_fixed_list_array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see any test coverage for this new code -- e.g. either unit tests for hashing or a higher level test like GROUP BY <FixedListArray>
Can you either ensure this code is tested somehow, or else perhaps move the hash support to a different PR so we can merge the arrow_cast support ?
| #[test] | ||
| // Tests actual values of hashes, which are different if forcing collisions | ||
| #[cfg(not(feature = "force_hash_collisions"))] | ||
| fn create_hashes_for_fixed_size_list_arrays() { | ||
| let data = vec![ | ||
| Some(vec![Some(0), Some(1), Some(2)]), | ||
| None, | ||
| Some(vec![Some(3), None, Some(5)]), | ||
| Some(vec![Some(3), None, Some(5)]), | ||
| None, | ||
| Some(vec![Some(0), Some(1), Some(2)]), | ||
| ]; | ||
| let list_array = | ||
| Arc::new(FixedSizeListArray::from_iter_primitive::<Int32Type, _, _>( | ||
| data, 3, | ||
| )) as ArrayRef; | ||
| let random_state = RandomState::with_seeds(0, 0, 0, 0); | ||
| let mut hashes = vec![0; list_array.len()]; | ||
| create_hashes(&[list_array], &random_state, &mut hashes).unwrap(); | ||
| assert_eq!(hashes[0], hashes[5]); | ||
| assert_eq!(hashes[1], hashes[4]); | ||
| assert_eq!(hashes[2], hashes[3]); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a unit test for the hash function.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @Weijun-H
Which issue does this PR close?
Closes #8343
Rationale for this change
FixedSizeListandList/LargeListWhat changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?