Skip to content

Avoid duplicating complex expression in comparisons#34172

Draft
ranma42 wants to merge 3 commits into
dotnet:mainfrom
ranma42:avoid-equal-duplication-34165
Draft

Avoid duplicating complex expression in comparisons#34172
ranma42 wants to merge 3 commits into
dotnet:mainfrom
ranma42:avoid-equal-duplication-34165

Conversation

@ranma42

@ranma42 ranma42 commented Jul 5, 2024

Copy link
Copy Markdown
Contributor

When comparing a nullable expression to a non-nullable one, a NULL result always
represent a difference.

This makes it possible to avoid duplicating the nullable expression by mapping
the NULL result to a FALSE (when comparing for equality).

Fixes #34165.

@ranma42

ranma42 commented Jul 5, 2024

Copy link
Copy Markdown
Contributor Author

This change can already take care of most of the worst offenders found in #34048 🥳

@ranma42

ranma42 commented Jul 5, 2024

Copy link
Copy Markdown
Contributor Author

I'll add some tests that check this transformation specifically
EDIT: done 👍

@ranma42 ranma42 force-pushed the avoid-equal-duplication-34165 branch 2 times, most recently from 8a4e1bf to 144b7e0 Compare July 13, 2024 06:55
body = _sqlExpressionFactory.OrElse(
_sqlExpressionFactory.AndAlso(body, _sqlExpressionFactory.AndAlso(leftIsNotNull, rightIsNotNull)),
_sqlExpressionFactory.AndAlso(leftIsNull, rightIsNull));
if (leftNullable && rightNullable

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranma42 can you please add a comment here explaining the logic, i.e. that duplication is bad except for columns, plus columns may make usage of indexes which arbitrary expressions (usually) won't?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added 4c3a542 (#34172) which is aimed at addressing this

"""
SELECT [j].[Id]
FROM [JsonEntitiesBasic] AS [j]
WHERE JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') <> N'Foo' OR JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') IS NULL

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a place where I'm a bit hesitant about this change... The SQL Server docs specifically document using an indexed computed column over JSON_VALUE as a way to speed up queries filtering inside a JSON document; unless I'm mistaken, these queries would likely stop using such an index if we switch to the CASE translation (maybe in this specific test it doesn't matter because of the inequality, but you get whar I'm saying).

In a perfect world, we'd vary our translation based on knowledge that an indexed computed column exists for this expression, but we're pretty far away from doing that at the moment.

Thoughts?

@ranma42 ranma42 Jul 27, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe it is very likely that the CASE translation will not take advantage of indexes, but I would expect the same to be true for the original version as well, as it is performing a <> comparison (maybe it would use the index to include all of the NULL values 🤔, but then it would still have to scan all of the non-null values and filter each of them).

For equality in predicates the translation should already be
WHERE JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') = N'Foo'
which should effectively take advantage of indexes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

So I'm trying to understand whether there are cases - and which ones - in which this PR causes a perf regression because the switch to CASE doesn't use an index. If there are such cases (and after all, we do avoid the CASE translation for columns because of this), we should think carefully - I'm not sure whether the optimization to remove double evaluation for some cases outweighs the (potentially severe) regression triggered by not using an index. A conservative approach would wait until we could know more reliably whether an index would be used on an expression (e.g. because we're aware of expression indexes/indexed computed columns).

I know I'm being very cautious here, I'm thinking about the perf regressions brought about by the switch from IN+constants to OPENJSON in 8.0 - that change improved general perf for many queries, but also caused severe regressions for others.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are cases in which the translation could cause a regression; the main one I can think of (which is the one currently avoided by the column handling) is the following (and similar ones):

.Where(e => !e.BoolA != e.NullableBoolB)

This is

SELECT "e"."Id"
FROM "Entities1" AS "e"
WHERE "e"."BoolA" = "e"."NullableBoolB" OR "e"."NullableBoolB" IS NULL

Sqlite (and litely other SQL providers) would take advantage of an index on NullableBoolB (assuming BoolA and NullableBoolB are actually columns from different tables).

When using the CASE, this becomes

SELECT "e"."Id"
FROM "Entities1" AS "e"
WHERE CASE
    WHEN "e"."BoolA" <> "e"."NullableBoolB" THEN 0
    ELSE 1
END

and the index cannot be used anymore.

I pushed ranma42@ecdd12e to show what happens when the CASE transformation is used whenever it is valid.

With #34166 this could possibly affect a few more tests, but if I am not mistaken, this boolean comparison (negated-different-from) is the only case in which a "good" WHERE would regress (at least according to optimizations rules similar to those of sqlite).

@ranma42 ranma42 Jul 27, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, obviously you could also do the same on json values:

.Where(e => !e.MyJsonColumn.BoolA != e.MyOtherJsonColumn.NullableBoolB)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of checking for a simple column, the right check would be whether the emitted operand is = vs !=? (aka if the WHERE predicate has some chances of being optimized)

@ranma42 ranma42 force-pushed the avoid-equal-duplication-34165 branch 2 times, most recently from 4a9993e to aeca728 Compare July 29, 2024 20:40
@ranma42

ranma42 commented Jul 29, 2024

Copy link
Copy Markdown
Contributor Author

I pushed a new version of the branch to solve the merge conflicts.
As I was at it, I also changed the logic behind the activation of the CASE transformation; it now only activates if it is valid (not on nullable vs nullable) and it causes no de-optimization (aka it is only allowed on predicates if the comparison is an inequality).

Comment on lines +2847 to +2865
WHEN [c].[Region] = N'ASK' AND [c].[Region] IS NOT NULL THEN CAST(1 AS bit)
WHEN [c].[Region] = N'ASK' THEN CAST(1 AS bit)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice side-effect, but we might want to ensure that this kind of optimization happens regardless of this PR (and possibly not only on comparisons 🤔 )

@ranma42 ranma42 force-pushed the avoid-equal-duplication-34165 branch from aeca728 to 0f67128 Compare December 23, 2024 17:40
@ranma42 ranma42 requested a review from a team as a code owner December 23, 2024 17:40
FROM [Order] AS [o0]
WHERE [o0].[CustomerId] = [o].[CustomerId]) AS [CustomerMinHourlyRate], MIN([o].[HourlyRate]) AS [HourlyRate], COUNT(*) AS [Count]
FROM [Order] AS [o]
WHERE [o].[Number] <> N'A1' OR [o].[Number] IS NULL

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this regress performance, at least for the case where Number is NULL? Is it worth making an exception for non-complex expressions, and not do the CASE translation?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, similar to this conversation above: #34172 (review)

@roji

roji commented Dec 24, 2024

Copy link
Copy Markdown
Member

/cc @maumar

@roji

roji commented Dec 28, 2024

Copy link
Copy Markdown
Member

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@ranma42

ranma42 commented Feb 9, 2025

Copy link
Copy Markdown
Contributor Author

rebased to resolve conflicts

@AndriySvyryd

Copy link
Copy Markdown
Member

@ranma42 Sorry for the delay. I am taking over for @roji. Please rebase on latest main

@AndriySvyryd AndriySvyryd marked this pull request as draft June 9, 2026 17:00

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates EF Core’s relational null-semantics rewriting for (in)equality comparisons to avoid duplicating nullable operands (especially expensive/complex SQL expressions) by translating certain comparisons into CASE WHEN ... THEN ... ELSE ... END shapes, and updates SQL baselines accordingly across multiple provider functional test suites.

Changes:

  • Adjust SqlNullabilityProcessor.RewriteNullSemantics to prefer a CASE-based rewrite for nullable-vs-non-nullable comparisons in scenarios where the previous rewrite duplicated the nullable operand.
  • Add new relational null-semantics test coverage for simple vs. complex nullable expressions.
  • Update many SQL Server and SQLite functional-test baselines to match the new CASE-based SQL translation.

Reviewed changes

Copilot reviewed 45 out of 45 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/EFCore.Relational/Query/SqlNullabilityProcessor.cs Implements the CASE-based null-semantics rewrite to reduce duplication in nullable-vs-non-nullable comparisons.
test/EFCore.Relational.Specification.Tests/Query/NullSemanticsQueryTestBase.cs Adds new test scenarios targeting comparisons involving simple/complex nullable expressions.
test/EFCore.Sqlite.FunctionalTests/Query/NullSemanticsQuerySqliteTest.cs Updates SQLite SQL assertions to reflect the new CASE translation for inequality with nullable operands.
test/EFCore.Sqlite.FunctionalTests/Query/ComplexTypeQuerySqliteTest.cs Updates SQLite SQL assertions for complex-type filter predicates using the new CASE translation.
test/EFCore.SqlServer.FunctionalTests/Query/TPTRelationshipsQuerySqlServerTest.cs Updates SQL Server baselines to CASE-based inequality translation.
test/EFCore.SqlServer.FunctionalTests/Query/TPCRelationshipsQuerySqlServerTest.cs Updates SQL Server baselines to CASE-based inequality translation.
test/EFCore.SqlServer.FunctionalTests/Query/TemporalOwnedQuerySqlServerTest.cs Updates temporal owned-query baselines to use CASE for nullable inequality patterns (including join predicates).
test/EFCore.SqlServer.FunctionalTests/Query/TemporalComplexNavigationsCollectionsSharedTypeQuerySqlServerTest.cs Updates SQL Server baselines for temporal complex navigation queries to the new CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/TemporalComplexNavigationsCollectionsQuerySqlServerTest.cs Updates SQL Server baselines for temporal complex navigation queries to the new CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/OwnedQuerySqlServerTest.cs Updates SQL Server baselines for owned-query scenarios to CASE-based inequality translation.
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindWhereQuerySqlServerTest.cs Updates Northwind WHERE predicate baseline(s) to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindStringIncludeQuerySqlServerTest.cs Updates OUTER APPLY include predicate baselines to CASE-based inequality translation.
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindSplitIncludeQuerySqlServerTest.cs Updates split-include OUTER APPLY predicate baselines to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindSplitIncludeNoTrackingQuerySqlServerTest.cs Updates split-include/no-tracking OUTER APPLY predicate baselines to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindSelectQuerySqlServerTest.cs Updates subquery duplication patterns in baselines to CASE-based inequality translation.
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindNavigationsQuerySqlServerTest.cs Updates navigation predicate baselines to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindMiscellaneousQuerySqlServerTest.cs Updates miscellaneous baselines (including DATEPART cases) to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindIncludeQuerySqlServerTest.cs Updates include OUTER APPLY predicate baselines to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindIncludeNoTrackingQuerySqlServerTest.cs Updates include/no-tracking OUTER APPLY predicate baselines to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindGroupByQuerySqlServerTest.cs Updates GROUP BY/HAVING baselines to use CASE for nullable inequality semantics.
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindEFPropertyIncludeQuerySqlServerTest.cs Updates EF.Property include OUTER APPLY predicate baselines to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindAsNoTrackingQuerySqlServerTest.cs Updates as-no-tracking WHERE baseline(s) to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/NorthwindAggregateOperatorsQuerySqlServerTest.cs Updates aggregate operator baselines to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/InheritanceRelationshipsQuerySqlServerTest.cs Updates inheritance relationship query baselines to CASE-based inequality translation.
test/EFCore.SqlServer.FunctionalTests/Query/FunkyDataQuerySqlServerTest.cs Updates complex boolean comparison baseline(s) to CASE-based inequality translation.
test/EFCore.SqlServer.FunctionalTests/Query/FunkyDataQueryAzureSynapseTest.cs Mirrors FunkyDataQuery updates for Azure Synapse baselines.
test/EFCore.SqlServer.FunctionalTests/Query/Ef6GroupBySqlServerTest.cs Updates EF6 group-by baseline(s) to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/ComplexTypeQuerySqlServerTest.cs Updates complex-type filter baselines to CASE-based inequality translation.
test/EFCore.SqlServer.FunctionalTests/Query/ComplexNavigationsQuerySqlServerTest.cs Updates complex navigation predicate baselines to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/ComplexNavigationsQuerySqlServer160Test.cs Updates SQL Server 16.0-specific baselines to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/ComplexNavigationsCollectionsSharedTypeQuerySqlServerTest.cs Updates shared-type collection navigation baselines to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/ComplexNavigationsCollectionsQuerySqlServerTest.cs Updates collection navigation baselines to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/AdHocQueryFiltersQuerySqlServerTest.cs Updates ad-hoc query filter baselines to CASE-based inequality translation.
test/EFCore.SqlServer.FunctionalTests/Query/AdHocNavigationsQuerySqlServerTest.cs Updates ad-hoc navigation query baselines to the CASE pattern.
test/EFCore.SqlServer.FunctionalTests/Query/AdHocMiscellaneousQuerySqlServerTest.cs Updates ad-hoc miscellaneous baselines to the CASE pattern.

Comment on lines +1615 to +1626
var originallyNotEqual = sqlBinaryExpression.OperatorType == ExpressionType.NotEqual;
var bodyNotEqual = body is SqlBinaryExpression { OperatorType: ExpressionType.NotEqual };

// When both operands are nullable, the CASE transformation is invalid.
// We also use the generic transformation when it simplifies to one of:
// - a == b && (a != null)
// - a == b && (b != null)
// - a == b || (a == null)
// - a == b || (b == null)
// as these expressions can use indexes on a and/or on b.
if (leftNullable && rightNullable || originallyNotEqual == bodyNotEqual)
{
Comment on lines +1634 to +1639
// When only one of the operands is nullable, we avoid duplicating
// complex expressions by performing the following transformation:
// a == b -> CASE WHEN a == b THEN TRUE ELSE FALSE END
body = _sqlExpressionFactory.Case(
[new(body, _sqlExpressionFactory.Constant(true, body.Type, body.TypeMapping))],
_sqlExpressionFactory.Constant(false, body.Type, body.TypeMapping));
ranma42 added 3 commits June 13, 2026 14:10
When comparing a nullable expression to a non-nullable one, a  `NULL` result  always
represent a difference.

This makes it possible to avoid duplicating the nullable expression by mapping
the `NULL` result to a `FALSE` (when comparing for equality).

Fixes dotnet#34165.
@ranma42 ranma42 force-pushed the avoid-equal-duplication-34165 branch from 7e1da4e to 64e1300 Compare June 13, 2026 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate alternative translations for (in)equality comparison

4 participants