Skip to content

Conversation

@pepijnve
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

case_when_with_expr has a code path to handle null values early on in the evaluation. It determines the presence of nulls using value_array.null_count() > 0. For nested arrays this may not be correct. logical_null_count() should be used instead.

What changes are included in this PR?

Check logical_null_count instead of null_count.

Are these changes tested?

Additional tests added to ensure the nested array case is tested. The test already passed before this change, but was a bit less efficient since the null values were tested for equality against each possible when value.

Are there any user-facing changes?

No

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Nov 21, 2025
@pepijnve pepijnve force-pushed the case_with_expr_logical_nulls branch from e37d026 to 5e41c7f Compare November 21, 2025 17:41
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pepijnve, I ran those tests on main now and their passed 🤔
I would expect them fail without this PR change?

@pepijnve
Copy link
Contributor Author

Thanks @pepijnve, I ran those tests on main now and their passed 🤔 I would expect them fail without this PR change?

No, that's expected. The code was computing the correct result in spite of this.

What was happens on main is that at https://github.com/apache/datafusion/pull/18872/files#diff-ac23ff0fe78acd71875341026dd5907736e3e3f49e2c398a69e6b33cb6394ae8L869 the null count is 0 so the null handling code path is not taken. The result is that the rows which have an effective null value are tested for equality against each when value. That's pointless since it will be falsy each time.

With this change the null handling code path is taken and the null values get filtered out before reaching the when branch handling.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pepijnve -- I went over the tests carefully and it looks good to me

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pepijnve for the explanation, makes a lot of sense

@comphead comphead added this pull request to the merge queue Nov 21, 2025
Merged via the queue into apache:main with commit 12cb4ca Nov 21, 2025
32 checks passed
logan-keede pushed a commit to logan-keede/datafusion that referenced this pull request Nov 23, 2025
## Which issue does this PR close?

- None, followup to apache#18152

## Rationale for this change

`case_when_with_expr` has a code path to handle `null` values early on
in the evaluation. It determines the presence of nulls using
`value_array.null_count() > 0`. For nested arrays this may not be
correct. `logical_null_count()` should be used instead.

## What changes are included in this PR?

Check `logical_null_count` instead of `null_count`.

## Are these changes tested?

Additional tests added to ensure the nested array case is tested. The
test already passed before this change, but was a bit less efficient
since the null values were tested for equality against each possible
when value.

## Are there any user-facing changes?

No
@pepijnve pepijnve deleted the case_with_expr_logical_nulls branch November 28, 2025 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants