Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Nov 7, 2025

Which issue does this PR close?

Rationale for this change

We made a change that causes a regression in planning one of the tpcds benchmarks (see #17801)

However, we only found the problem when running planning benchmarks. The normal CI test pass. We have a tpchds_planning suite to test for exactly this scenario, so that test should have failed as well

Debugging more it turns out that the test somewhat non obviously calls optimize twice on the LogicalPlan which in this case masks the error.

What changes are included in this PR?

  1. avoid calling optimize twice in the test

Note that due to #17801 the test now fails like this:

cargo test --test tpcds_planning
---- tpcds_physical_q75 stdout ----
Error: Internal("Physical input schema should be the same as the one converted from logical input schema. Differences: \n\t- field nullability at index 5 [sales_cnt]: (physical) true vs (logical) false\n\t- field nullability at index 6 [sales_amt]: (physical) true vs (logical) false")

Are these changes tested?

They are only tests

Are there any user-facing changes?

No, this is an internal only change

@github-actions github-actions bot added the core Core DataFusion crate label Nov 7, 2025
@alamb alamb changed the title Avoid double optimizing in tpchds_planning tests Avoid double optimizing in tpchds_planning tests to avoid masking errors Nov 7, 2025
alamb added a commit that referenced this pull request Nov 10, 2025
…ion (#18567)

Note this targets the branch-51 release branch

## Which issue does this PR close?

- part of #17558
- resolves #17801 in the 51
release branch

## Rationale for this change

- We merged some clever rewrites for `coalesce` and `nvl2` to use `CASE`
which are faster and more correct (👏 @chenkovsky @kosiew )
- However, these rewrites cause subtle schema mismatches in some cases
planning (b/c the CASE simplification nullability logic can't determine
the correct nullability in some cases - see
#17801)
- @pepijnve has some heroic efforts to fix the schema mismatch in
#17813 (comment),
but it is non trivial and I am worried about merging it so close to the
51 release and introducing new edge cases

## What changes are included in this PR?

1. Revert #17357 /
e5dcc8c
3. Revert #17991 /
ea83c26
2. Revert #18191 /
22c4214
2. Cherry-pick 6202254, a test that
reproduces the schema mismatch issue (from
#18536)
3. Cherry-pick 735cacf, a fix for the
benchmarks that regressed due to the revert (from
#17833)
4. Update datafusion-testing (see separate PR here) for extended tests
(see apache/datafusion-testing#15)

## Are these changes tested?

Yes I added a new test

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
github-merge-queue bot pushed a commit that referenced this pull request Nov 20, 2025
## Which issue does this PR close?

- Closes #17801
- Obviates (contains) and thus Closes #17833
- Obviates (contains) and thus Closes #18536

## Rationale for this change

#17357 introduced a change that replaces `coalesce` function calls with
`case` expressions. In the current implementation these two differ in
the way they report their nullability. `coalesce` is more precise than
`case` all will report itself as not nullable in situations where the
equivalent `case` does report being nullable.

The rest of the codebase currently does not expect the nullability
property of an expression to change as a side effect of expression
simplification. This PR is a first attempt to align the nullability of
`coalesce` and `case`.

## What changes are included in this PR?

Tweaks to the `nullable` logic for the logical and physical `case`
expression code to report `case` as being not nullable in more
situations.

- For logical `case`, a best effort const evaluation of 'when'
expressions is done to determine 'then' reachability. The code errs on
the conservative side wrt nullability.
- For physical `case`, const evaluation of 'when' expressions using a
placeholder record batch is attempted to determine 'then' reachability.
Again if const evaluation is not possible, the code errs on the
conservative side.
- The optimizer schema check has been relaxed slightly to allow
nullability to be removed by optimizer passes without having to disable
the schema check entirely
- The panic'ing benchmark has been reenabled

## Are these changes tested?

Additional unit tests have been added to test the new logic.

## Are there any user-facing changes?

No

---------

Co-authored-by: Andrew Lamb <[email protected]>
@alamb alamb closed this in #17813 Nov 20, 2025
logan-keede pushed a commit to logan-keede/datafusion that referenced this pull request Nov 23, 2025
…e#17813)

## Which issue does this PR close?

- Closes apache#17801
- Obviates (contains) and thus Closes apache#17833
- Obviates (contains) and thus Closes apache#18536

## Rationale for this change

apache#17357 introduced a change that replaces `coalesce` function calls with
`case` expressions. In the current implementation these two differ in
the way they report their nullability. `coalesce` is more precise than
`case` all will report itself as not nullable in situations where the
equivalent `case` does report being nullable.

The rest of the codebase currently does not expect the nullability
property of an expression to change as a side effect of expression
simplification. This PR is a first attempt to align the nullability of
`coalesce` and `case`.

## What changes are included in this PR?

Tweaks to the `nullable` logic for the logical and physical `case`
expression code to report `case` as being not nullable in more
situations.

- For logical `case`, a best effort const evaluation of 'when'
expressions is done to determine 'then' reachability. The code errs on
the conservative side wrt nullability.
- For physical `case`, const evaluation of 'when' expressions using a
placeholder record batch is attempted to determine 'then' reachability.
Again if const evaluation is not possible, the code errs on the
conservative side.
- The optimizer schema check has been relaxed slightly to allow
nullability to be removed by optimizer passes without having to disable
the schema check entirely
- The panic'ing benchmark has been reenabled

## Are these changes tested?

Additional unit tests have been added to test the new logic.

## Are there any user-facing changes?

No

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant