Avoid duplicating complex expression in comparisons#34172
Conversation
|
This change can already take care of most of the worst offenders found in #34048 🥳 |
|
I'll add some tests that check this transformation specifically |
8a4e1bf to
144b7e0
Compare
| body = _sqlExpressionFactory.OrElse( | ||
| _sqlExpressionFactory.AndAlso(body, _sqlExpressionFactory.AndAlso(leftIsNotNull, rightIsNotNull)), | ||
| _sqlExpressionFactory.AndAlso(leftIsNull, rightIsNull)); | ||
| if (leftNullable && rightNullable |
There was a problem hiding this comment.
@ranma42 can you please add a comment here explaining the logic, i.e. that duplication is bad except for columns, plus columns may make usage of indexes which arbitrary expressions (usually) won't?
There was a problem hiding this comment.
I added 4c3a542 (#34172) which is aimed at addressing this
| """ | ||
| SELECT [j].[Id] | ||
| FROM [JsonEntitiesBasic] AS [j] | ||
| WHERE JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') <> N'Foo' OR JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') IS NULL |
There was a problem hiding this comment.
This is a place where I'm a bit hesitant about this change... The SQL Server docs specifically document using an indexed computed column over JSON_VALUE as a way to speed up queries filtering inside a JSON document; unless I'm mistaken, these queries would likely stop using such an index if we switch to the CASE translation (maybe in this specific test it doesn't matter because of the inequality, but you get whar I'm saying).
In a perfect world, we'd vary our translation based on knowledge that an indexed computed column exists for this expression, but we're pretty far away from doing that at the moment.
Thoughts?
There was a problem hiding this comment.
Yes, I believe it is very likely that the CASE translation will not take advantage of indexes, but I would expect the same to be true for the original version as well, as it is performing a <> comparison (maybe it would use the index to include all of the NULL values 🤔, but then it would still have to scan all of the non-null values and filter each of them).
For equality in predicates the translation should already be
WHERE JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') = N'Foo'
which should effectively take advantage of indexes.
There was a problem hiding this comment.
OK.
So I'm trying to understand whether there are cases - and which ones - in which this PR causes a perf regression because the switch to CASE doesn't use an index. If there are such cases (and after all, we do avoid the CASE translation for columns because of this), we should think carefully - I'm not sure whether the optimization to remove double evaluation for some cases outweighs the (potentially severe) regression triggered by not using an index. A conservative approach would wait until we could know more reliably whether an index would be used on an expression (e.g. because we're aware of expression indexes/indexed computed columns).
I know I'm being very cautious here, I'm thinking about the perf regressions brought about by the switch from IN+constants to OPENJSON in 8.0 - that change improved general perf for many queries, but also caused severe regressions for others.
There was a problem hiding this comment.
Yes, there are cases in which the translation could cause a regression; the main one I can think of (which is the one currently avoided by the column handling) is the following (and similar ones):
.Where(e => !e.BoolA != e.NullableBoolB)This is
SELECT "e"."Id"
FROM "Entities1" AS "e"
WHERE "e"."BoolA" = "e"."NullableBoolB" OR "e"."NullableBoolB" IS NULLSqlite (and litely other SQL providers) would take advantage of an index on NullableBoolB (assuming BoolA and NullableBoolB are actually columns from different tables).
When using the CASE, this becomes
SELECT "e"."Id"
FROM "Entities1" AS "e"
WHERE CASE
WHEN "e"."BoolA" <> "e"."NullableBoolB" THEN 0
ELSE 1
ENDand the index cannot be used anymore.
I pushed ranma42@ecdd12e to show what happens when the CASE transformation is used whenever it is valid.
With #34166 this could possibly affect a few more tests, but if I am not mistaken, this boolean comparison (negated-different-from) is the only case in which a "good" WHERE would regress (at least according to optimizations rules similar to those of sqlite).
There was a problem hiding this comment.
ah, obviously you could also do the same on json values:
.Where(e => !e.MyJsonColumn.BoolA != e.MyOtherJsonColumn.NullableBoolB)There was a problem hiding this comment.
Maybe instead of checking for a simple column, the right check would be whether the emitted operand is = vs !=? (aka if the WHERE predicate has some chances of being optimized)
4a9993e to
aeca728
Compare
|
I pushed a new version of the branch to solve the merge conflicts. |
| WHEN [c].[Region] = N'ASK' AND [c].[Region] IS NOT NULL THEN CAST(1 AS bit) | ||
| WHEN [c].[Region] = N'ASK' THEN CAST(1 AS bit) |
There was a problem hiding this comment.
this is a nice side-effect, but we might want to ensure that this kind of optimization happens regardless of this PR (and possibly not only on comparisons 🤔 )
aeca728 to
0f67128
Compare
| FROM [Order] AS [o0] | ||
| WHERE [o0].[CustomerId] = [o].[CustomerId]) AS [CustomerMinHourlyRate], MIN([o].[HourlyRate]) AS [HourlyRate], COUNT(*) AS [Count] | ||
| FROM [Order] AS [o] | ||
| WHERE [o].[Number] <> N'A1' OR [o].[Number] IS NULL |
There was a problem hiding this comment.
Doesn't this regress performance, at least for the case where Number is NULL? Is it worth making an exception for non-complex expressions, and not do the CASE translation?
There was a problem hiding this comment.
Right, similar to this conversation above: #34172 (review)
|
/cc @maumar |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
0f67128 to
7e1da4e
Compare
|
rebased to resolve conflicts |
249ae47 to
6b86657
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates EF Core’s relational null-semantics rewriting for (in)equality comparisons to avoid duplicating nullable operands (especially expensive/complex SQL expressions) by translating certain comparisons into CASE WHEN ... THEN ... ELSE ... END shapes, and updates SQL baselines accordingly across multiple provider functional test suites.
Changes:
- Adjust
SqlNullabilityProcessor.RewriteNullSemanticsto prefer aCASE-based rewrite for nullable-vs-non-nullable comparisons in scenarios where the previous rewrite duplicated the nullable operand. - Add new relational null-semantics test coverage for simple vs. complex nullable expressions.
- Update many SQL Server and SQLite functional-test baselines to match the new
CASE-based SQL translation.
Reviewed changes
Copilot reviewed 45 out of 45 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/EFCore.Relational/Query/SqlNullabilityProcessor.cs | Implements the CASE-based null-semantics rewrite to reduce duplication in nullable-vs-non-nullable comparisons. |
| test/EFCore.Relational.Specification.Tests/Query/NullSemanticsQueryTestBase.cs | Adds new test scenarios targeting comparisons involving simple/complex nullable expressions. |
| test/EFCore.Sqlite.FunctionalTests/Query/NullSemanticsQuerySqliteTest.cs | Updates SQLite SQL assertions to reflect the new CASE translation for inequality with nullable operands. |
| test/EFCore.Sqlite.FunctionalTests/Query/ComplexTypeQuerySqliteTest.cs | Updates SQLite SQL assertions for complex-type filter predicates using the new CASE translation. |
| test/EFCore.SqlServer.FunctionalTests/Query/TPTRelationshipsQuerySqlServerTest.cs | Updates SQL Server baselines to CASE-based inequality translation. |
| test/EFCore.SqlServer.FunctionalTests/Query/TPCRelationshipsQuerySqlServerTest.cs | Updates SQL Server baselines to CASE-based inequality translation. |
| test/EFCore.SqlServer.FunctionalTests/Query/TemporalOwnedQuerySqlServerTest.cs | Updates temporal owned-query baselines to use CASE for nullable inequality patterns (including join predicates). |
| test/EFCore.SqlServer.FunctionalTests/Query/TemporalComplexNavigationsCollectionsSharedTypeQuerySqlServerTest.cs | Updates SQL Server baselines for temporal complex navigation queries to the new CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/TemporalComplexNavigationsCollectionsQuerySqlServerTest.cs | Updates SQL Server baselines for temporal complex navigation queries to the new CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/OwnedQuerySqlServerTest.cs | Updates SQL Server baselines for owned-query scenarios to CASE-based inequality translation. |
| test/EFCore.SqlServer.FunctionalTests/Query/NorthwindWhereQuerySqlServerTest.cs | Updates Northwind WHERE predicate baseline(s) to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/NorthwindStringIncludeQuerySqlServerTest.cs | Updates OUTER APPLY include predicate baselines to CASE-based inequality translation. |
| test/EFCore.SqlServer.FunctionalTests/Query/NorthwindSplitIncludeQuerySqlServerTest.cs | Updates split-include OUTER APPLY predicate baselines to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/NorthwindSplitIncludeNoTrackingQuerySqlServerTest.cs | Updates split-include/no-tracking OUTER APPLY predicate baselines to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/NorthwindSelectQuerySqlServerTest.cs | Updates subquery duplication patterns in baselines to CASE-based inequality translation. |
| test/EFCore.SqlServer.FunctionalTests/Query/NorthwindNavigationsQuerySqlServerTest.cs | Updates navigation predicate baselines to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/NorthwindMiscellaneousQuerySqlServerTest.cs | Updates miscellaneous baselines (including DATEPART cases) to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/NorthwindIncludeQuerySqlServerTest.cs | Updates include OUTER APPLY predicate baselines to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/NorthwindIncludeNoTrackingQuerySqlServerTest.cs | Updates include/no-tracking OUTER APPLY predicate baselines to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/NorthwindGroupByQuerySqlServerTest.cs | Updates GROUP BY/HAVING baselines to use CASE for nullable inequality semantics. |
| test/EFCore.SqlServer.FunctionalTests/Query/NorthwindEFPropertyIncludeQuerySqlServerTest.cs | Updates EF.Property include OUTER APPLY predicate baselines to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/NorthwindAsNoTrackingQuerySqlServerTest.cs | Updates as-no-tracking WHERE baseline(s) to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/NorthwindAggregateOperatorsQuerySqlServerTest.cs | Updates aggregate operator baselines to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/InheritanceRelationshipsQuerySqlServerTest.cs | Updates inheritance relationship query baselines to CASE-based inequality translation. |
| test/EFCore.SqlServer.FunctionalTests/Query/FunkyDataQuerySqlServerTest.cs | Updates complex boolean comparison baseline(s) to CASE-based inequality translation. |
| test/EFCore.SqlServer.FunctionalTests/Query/FunkyDataQueryAzureSynapseTest.cs | Mirrors FunkyDataQuery updates for Azure Synapse baselines. |
| test/EFCore.SqlServer.FunctionalTests/Query/Ef6GroupBySqlServerTest.cs | Updates EF6 group-by baseline(s) to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/ComplexTypeQuerySqlServerTest.cs | Updates complex-type filter baselines to CASE-based inequality translation. |
| test/EFCore.SqlServer.FunctionalTests/Query/ComplexNavigationsQuerySqlServerTest.cs | Updates complex navigation predicate baselines to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/ComplexNavigationsQuerySqlServer160Test.cs | Updates SQL Server 16.0-specific baselines to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/ComplexNavigationsCollectionsSharedTypeQuerySqlServerTest.cs | Updates shared-type collection navigation baselines to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/ComplexNavigationsCollectionsQuerySqlServerTest.cs | Updates collection navigation baselines to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/AdHocQueryFiltersQuerySqlServerTest.cs | Updates ad-hoc query filter baselines to CASE-based inequality translation. |
| test/EFCore.SqlServer.FunctionalTests/Query/AdHocNavigationsQuerySqlServerTest.cs | Updates ad-hoc navigation query baselines to the CASE pattern. |
| test/EFCore.SqlServer.FunctionalTests/Query/AdHocMiscellaneousQuerySqlServerTest.cs | Updates ad-hoc miscellaneous baselines to the CASE pattern. |
| var originallyNotEqual = sqlBinaryExpression.OperatorType == ExpressionType.NotEqual; | ||
| var bodyNotEqual = body is SqlBinaryExpression { OperatorType: ExpressionType.NotEqual }; | ||
|
|
||
| // When both operands are nullable, the CASE transformation is invalid. | ||
| // We also use the generic transformation when it simplifies to one of: | ||
| // - a == b && (a != null) | ||
| // - a == b && (b != null) | ||
| // - a == b || (a == null) | ||
| // - a == b || (b == null) | ||
| // as these expressions can use indexes on a and/or on b. | ||
| if (leftNullable && rightNullable || originallyNotEqual == bodyNotEqual) | ||
| { |
| // When only one of the operands is nullable, we avoid duplicating | ||
| // complex expressions by performing the following transformation: | ||
| // a == b -> CASE WHEN a == b THEN TRUE ELSE FALSE END | ||
| body = _sqlExpressionFactory.Case( | ||
| [new(body, _sqlExpressionFactory.Constant(true, body.Type, body.TypeMapping))], | ||
| _sqlExpressionFactory.Constant(false, body.Type, body.TypeMapping)); |
When comparing a nullable expression to a non-nullable one, a `NULL` result always represent a difference. This makes it possible to avoid duplicating the nullable expression by mapping the `NULL` result to a `FALSE` (when comparing for equality). Fixes dotnet#34165.
7e1da4e to
64e1300
Compare
When comparing a nullable expression to a non-nullable one, a
NULLresult alwaysrepresent a difference.
This makes it possible to avoid duplicating the nullable expression by mapping
the
NULLresult to aFALSE(when comparing for equality).Fixes #34165.