Skip to content

Conversation

@Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Nov 28, 2023

Which issue does this PR close?

Closes #8343

Rationale for this change

  • add hash function for 'FixedSizeList'
  • support cast between FixedSizeList and List / LargeList

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels Nov 28, 2023
@Weijun-H
Copy link
Member Author

stall until the next arrow-rs (50.0)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this PR is waiting for the next arrow release (arrow 50.0.0?)

@Weijun-H
Copy link
Member Author

Weijun-H commented Dec 2, 2023

I wonder why this PR is waiting for the next arrow release (arrow 50.0.0?)

Because it just updated the support cast from List/LargeList to FixedList @alamb
https://github.com/apache/arrow-rs/blob/df69ef57d055453c399fa925ad315d19211d7ab2/arrow-cast/src/cast.rs#L808-L815

@alamb
Copy link
Contributor

alamb commented Dec 3, 2023

I see -- so while this code doesn't directly depend on the arrow release, it won't be very helpful until that support released. Makes sense to me. Thank you @Weijun-H

@alamb alamb changed the title Add support for parsing FixedSizeList type Add support for parsing FixedSizeList type in arrow_cast Dec 3, 2023
@Weijun-H Weijun-H force-pushed the cast-fixedsizelist-list branch from 37638c8 to b8b12a1 Compare January 15, 2024 11:27
@alamb
Copy link
Contributor

alamb commented Jan 15, 2024

❤️

@Weijun-H Weijun-H marked this pull request as ready for review January 15, 2024 13:24
@alamb alamb changed the title Add support for parsing FixedSizeList type in arrow_cast Add support for FixedSizeList type in arrow_cast, hashing Jan 16, 2024
@Weijun-H Weijun-H force-pushed the cast-fixedsizelist-list branch from b5ccbe2 to e6c9b34 Compare January 17, 2024 04:43
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Weijun-H -- as always, your contributions are most appreciated

----
NULL

#TODO: arrow-rs doesn't support it yet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be casting [1] (not 1)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the List supports casting from UTF8 to List with a single size. Therefore, I think FixedSizeList should also support it.

select arrow_cast('1', 'LargeList(Int64)');
----
[1]

Ok(())
}

fn hash_fixed_list_array(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any test coverage for this new code -- e.g. either unit tests for hashing or a higher level test like GROUP BY <FixedListArray>

Can you either ensure this code is tested somehow, or else perhaps move the hash support to a different PR so we can merge the arrow_cast support ?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jan 19, 2024
Comment on lines +586 to +609
#[test]
// Tests actual values of hashes, which are different if forcing collisions
#[cfg(not(feature = "force_hash_collisions"))]
fn create_hashes_for_fixed_size_list_arrays() {
let data = vec![
Some(vec![Some(0), Some(1), Some(2)]),
None,
Some(vec![Some(3), None, Some(5)]),
Some(vec![Some(3), None, Some(5)]),
None,
Some(vec![Some(0), Some(1), Some(2)]),
];
let list_array =
Arc::new(FixedSizeListArray::from_iter_primitive::<Int32Type, _, _>(
data, 3,
)) as ArrayRef;
let random_state = RandomState::with_seeds(0, 0, 0, 0);
let mut hashes = vec![0; list_array.len()];
create_hashes(&[list_array], &random_state, &mut hashes).unwrap();
assert_eq!(hashes[0], hashes[5]);
assert_eq!(hashes[1], hashes[4]);
assert_eq!(hashes[2], hashes[3]);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a unit test for the hash function.

@Weijun-H Weijun-H requested a review from alamb January 19, 2024 02:49
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @Weijun-H

@alamb alamb merged commit ae0f401 into apache:main Jan 19, 2024
@Weijun-H Weijun-H deleted the cast-fixedsizelist-list branch January 29, 2024 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support FixedSizeList for arrow_cast

2 participants