-
Notifications
You must be signed in to change notification settings - Fork 3.9k
ARROW-12426: [Rust] Fix concatentation of arrow dictionaries #10073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -384,4 +384,74 @@ mod tests { | |
|
|
||
| Ok(()) | ||
| } | ||
|
|
||
| fn collect_string_dictionary( | ||
| dictionary: &DictionaryArray<Int32Type>, | ||
| ) -> Vec<Option<String>> { | ||
| let values = dictionary.values(); | ||
| let values = values.as_any().downcast_ref::<StringArray>().unwrap(); | ||
|
|
||
| dictionary | ||
| .keys() | ||
| .iter() | ||
| .map(|key| key.map(|key| values.value(key as _).to_string())) | ||
| .collect() | ||
| } | ||
|
|
||
| fn concat_dictionary( | ||
| input_1: DictionaryArray<Int32Type>, | ||
| input_2: DictionaryArray<Int32Type>, | ||
| ) -> Vec<Option<String>> { | ||
| let concat = concat(&[&input_1 as _, &input_2 as _]).unwrap(); | ||
| let concat = concat | ||
| .as_any() | ||
| .downcast_ref::<DictionaryArray<Int32Type>>() | ||
| .unwrap(); | ||
|
|
||
| collect_string_dictionary(concat) | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_string_dictionary_array() -> Result<()> { | ||
| let input_1: DictionaryArray<Int32Type> = | ||
| vec!["hello", "A", "B", "hello", "hello", "C"] | ||
| .into_iter() | ||
| .collect(); | ||
| let input_2: DictionaryArray<Int32Type> = | ||
| vec!["hello", "E", "E", "hello", "F", "E"] | ||
| .into_iter() | ||
| .collect(); | ||
|
|
||
| let expected: Vec<_> = vec![ | ||
| "hello", "A", "B", "hello", "hello", "C", "hello", "E", "E", "hello", "F", | ||
| "E", | ||
| ] | ||
| .into_iter() | ||
| .map(|x| Some(x.to_string())) | ||
| .collect(); | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I recommend testing the actual values from the dictionary array (aka test the content is as expected) as well here explicitly -- something like (tested and it passes for me locally)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm confused, is this not what I do a couple of lines down? Edit: I've taken a guess at what you meant and hardcoded the expected result instead of computing it
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, I meant a hard coded expected result. The rationale being that then there is less change a bug in the arrow implementation can cause the expected results to change. By comparing the output of |
||
| let concat = concat_dictionary(input_1, input_2); | ||
| assert_eq!(concat, expected); | ||
| Ok(()) | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_string_dictionary_array_nulls() -> Result<()> { | ||
| let input_1: DictionaryArray<Int32Type> = | ||
| vec![Some("foo"), Some("bar"), None, Some("fiz")] | ||
| .into_iter() | ||
| .collect(); | ||
| let input_2: DictionaryArray<Int32Type> = vec![None].into_iter().collect(); | ||
| let expected = vec![ | ||
| Some("foo".to_string()), | ||
| Some("bar".to_string()), | ||
| None, | ||
| Some("fiz".to_string()), | ||
| None, | ||
| ]; | ||
|
|
||
| let concat = concat_dictionary(input_1, input_2); | ||
| assert_eq!(concat, expected); | ||
| Ok(()) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest adding at least one Null value to the dictionary to test the nulls case |
||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how best to handle this, the other option would be to handle dictionary concatenation in the concat kernel which is fallible - not sure which is better as I'm not all that familiar with the codebase yet