Add comparison support for Union arrays #8838

friendlymatthew · 2025-11-13T20:07:03Z

Which issue does this PR close?

Closes Add comparison support for Union arrays in the cmp kernel #8837
Related to Support Union data types for row format #8828

Rationale for this change

This PR implements comparison functionality for Union arrays. This implementation follows a simple ordering strategy where unions are first compared by their type identifier, and only when type identifiers match are the actual values within those types compared

This approach handles both sparse and dense union modes correctly by using offsets when present (dense unions) or direct indices (sparse unions) to locate the appropriate child array values

friendlymatthew · 2025-11-13T20:07:55Z

cc @paddyhoran @alamb

arrow-ord/src/ord.rs

alamb

Thanks @friendlymatthew -- this looks good to me. The only thing I think it needs are some tests of the error cases -- namely

Compare an out of bounds index (expect panic)
Try to compare incompatible union types

I also left some other small suggestions but I don't think they are needed

alamb · 2025-11-19T18:41:48Z

arrow-ord/src/ord.rs

+    let left = left.as_union();
+    let right = right.as_union();
+
+    let (left_fields, left_mode) = match left.data_type() {


This is weird to have to re-check the DataTypes.

What would you think about adding UnionArray::fields() and UnionArray::mode() methods to make the code easier to work with?

This should be super quick to review: #8884

Somewhat related but it feels a bit weird that the following works without any notice to the user:

#[test] fn test_union_fields() { let ids = vec![0, 1, 2]; let field = Field::new("a", DataType::Binary, true); // different length of ids and fields (we zip so we truncate the longer vec) let _out = UnionFields::new(ids.clone(), vec![field.clone()]); // duplicate fields associated with different type ids! let _out = UnionFields::new(ids, vec![field.clone(), field]); }

I feel like we could benefit from a bit more validation? We could leave UnionFields::new but also have a UnionFields::try_new that checks the above 🤔

Yes, I think that sounds like a good idea to me

We can even deprecate UnionFields::new to help people migrate over

Here it is: #8891

Here is another minor convenience improvement: #8895

alamb · 2025-11-19T18:43:07Z

arrow-ord/src/ord.rs

+
+    if left_fields != right_fields || left_mode != right_mode {
+        return Err(ArrowError::InvalidArgumentError(
+            "Cannot compare UnionArrays with different fields or modes".to_string(),


I recommend adding more details to this message to help when people hit it -- specifically, I recommend

a separate message for different modes (and include the modes in the error message)

Add the fields ({fields:?} style) to the message

alamb · 2025-11-19T18:44:55Z

arrow-ord/src/ord.rs

+
+    let c_opts = child_opts(opts);
+
+    let mut field_comparators = HashMap::with_capacity(left_fields.len());


rather than a hash map you could potentially just use a 128 valued Vec<> indexed by the typeids -- since typeid is i8 you know there can be at most 128 values that might be faster to lookup than hashing/hash table

Hm so this was my first thought/approach as well, but I decided to use a hashmap because it avoids superfluous memory usage for sparse sets

Plus, I don't think this is a very hot path, so any perf differences wouldn't be super meaningful

alamb

Thank you @friendlymatthew

# Which issue does this PR close? This PR adds another method on the `UnionArray` api that returns a list of `FieldRef`s associated with the union type See: #8838 (comment)

alamb

Thanks @friendlymatthew

github-actions bot added the arrow Changes to the arrow crate label Nov 13, 2025

martin-g reviewed Nov 14, 2025

View reviewed changes

arrow-ord/src/ord.rs Outdated Show resolved Hide resolved

alamb mentioned this pull request Nov 14, 2025

Andrew Lamb Weekly-ish Open Source plan - 2025-11-17 apache/datafusion#18711

Closed

46 tasks

friendlymatthew mentioned this pull request Nov 14, 2025

Make UnionArrays hashable apache/datafusion#18717

Closed

compare_union

87c792b

friendlymatthew force-pushed the friendlymatthew/compare-union branch from 0ceac84 to 87c792b Compare November 15, 2025 01:53

This was referenced Nov 17, 2025

Hash UnionArrays apache/datafusion#18718

Merged

Add UnionArray tests exercising hashing, group-by, distinct, and aggregates apache/datafusion#18791

Open

Extend comparison support for Union arrays against an opaque array #8881

Open

alamb reviewed Nov 19, 2025

View reviewed changes

friendlymatthew mentioned this pull request Nov 19, 2025

Add UnionArray::fields #8884

Merged

Test edge cases

ad9027b

friendlymatthew force-pushed the friendlymatthew/compare-union branch from 779ea23 to ad9027b Compare November 19, 2025 19:41

Add specialized error messages

c91aec7

alamb approved these changes Nov 19, 2025

View reviewed changes

friendlymatthew mentioned this pull request Nov 20, 2025

Add union to opaque comparisons #8896

Open

friendlymatthew added a commit to pydantic/arrow-rs that referenced this pull request Nov 20, 2025

compare union apache#8838

d92d584

alamb pushed a commit that referenced this pull request Nov 24, 2025

Add UnionArray::fields (#8884)

5f3577a

# Which issue does this PR close? This PR adds another method on the `UnionArray` api that returns a list of `FieldRef`s associated with the union type See: #8838 (comment)

alamb approved these changes Nov 24, 2025

View reviewed changes

alamb merged commit a8a63c2 into apache:main Nov 24, 2025
17 checks passed


		let c_opts = child_opts(opts);

		let mut field_comparators = HashMap::with_capacity(left_fields.len());

Add comparison support for Union arrays #8838

Add comparison support for Union arrays #8838

Uh oh!

Conversation

friendlymatthew commented Nov 13, 2025

Which issue does this PR close?

Rationale for this change

Uh oh!

friendlymatthew commented Nov 13, 2025

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants