Fix to enable index scan with LIKE operator for non-const nodes by RuchaSK1 · Pull Request #4589 · babelfish-for-postgresql/babelfish_extensions

RuchaSK1 · 2026-02-20T20:29:05Z

Description

Previously, LIKE operator index scan optimization only worked when the pattern was already a Const node. Expressions that could be simplified to a constant such as stable functions or type castswere left unreduced, forcing a sequential scan.
Additionally, view definitions after dump/restore lost the original column collation due to ::text casts, causing incorrect LIKE/ILIKE operator selection and skipping optimization entirely.

Changes

Constant folding of RHS expressions : Apply eval_const_expressions() and evaluate_expr() to fold reducible pattern expressions down to a Const before prefix extraction. like_escape and remove_accents_internal are excluded to preserve ESCAPE clause and AI collation correctness.

Collation recovery after dump/restore : Unwrap RelabelType coercions on the left-hand side to recover the original column collation when ::text casts have stripped it.

NOT LIKE fix : Changed exact-match NOT LIKE optimization from incorrect AND to correct OR semantics: col <> pattern OR col NOT LIKE pattern.

Type reconciliation : Ensure the generated equality/range expressions always use the column's actual type rather than the LIKE-coerced TEXT type, so the planner can match the correct index operator class.

Issues Resolved

Task: BABEL-5966

Authored-by: Rucha Kulkarni [email protected]
Signed-off-by: Rucha Kulkarni [email protected]

Test Scenarios Covered

Use case based - Yes
Boundary conditions - Yes
Arbitrary inputs -
Negative test cases -
Minor version upgrade tests -
Major version upgrade tests -
Performance tests -
Tooling impact -
Client tests -

Check List

Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Rucha Kulkarni <[email protected]>

…-5966

Signed-off-by: Rucha Kulkarni <[email protected]>

…-5966

Signed-off-by: Rucha Kulkarni <[email protected]>

…-5966

ahmed-shameem · 2026-03-06T06:16:06Z

contrib/babelfishpg_tsql/src/collation.c

+	 */
+
+	check_node = rightop;
+	while (IsA(check_node, RelabelType))


Why do we need this? Is this needed for AI collations?

Yes, needed for AI collations. When rightop is RelabelType(remove_accents_internal(...)), without the peek we can't detect remove_accents_internal inside. evaluate_expr would reduce it to a plain Const, losing AI semantics.

ahmed-shameem · 2026-03-06T06:20:20Z

contrib/babelfishpg_tsql/src/collation.c

+	{
+		rightop = eval_const_expressions(NULL, rightop);
+
+		if (!IsA(rightop, Const) && !IsA(rightop, Param) &&


Why do we need these conditions?

evaluate_expr actually executes the expression via PG executor. Without these checks, it crashes or produces wrong results for expressions that can't be evaluated at plan time.

How do we make sure we are not leaving any other condition out? I mean handling this way can lead to cases which we are not handling explicitly. What happens then?

ahmed-shameem · 2026-03-06T06:23:18Z

contrib/babelfishpg_tsql/src/collation.c

 		RelabelType	*relabel = (RelabelType *) leftop;
 		leftop = copyObject((Node*) relabel->arg);
-		prefix->consttype = rtypeId = ltypeId = exprType(leftop);
+		ltypeId = exprType(leftop);


What is the necessity to make this change? Is there any test case which fails?

Yes. Without this, BABEL_SCHEMATA-vu-verify crashes:

use db_schemata_10; SELECT * FROM information_schema.schemata ORDER BY SCHEMA_NAME; TRAP: failed Assert("ltypeId == rtypeId")

rtypeId is set once at the top from original rightop. After evaluate_expr changes rightop to a Const, rtypeId becomes stale. Previously this code was never reached because rightop was never a Const for such queries. Now it is, and types mismatch.

ahmed-shameem · 2026-03-06T06:24:16Z

contrib/babelfishpg_tsql/src/collation.c

+
+		if (OidIsValid(like_entry.like_oid) && list_length(op->args) >= 2)
+		{
+					Node *left_arg = linitial(op->args);


nit: indentation

ahmed-shameem · 2026-03-06T06:25:58Z

contrib/babelfishpg_tsql/src/collation.c

+		{
+					Node *left_arg = linitial(op->args);
+					Node *right_arg = lsecond(op->args);
+			if (!IsA(left_arg, CollateExpr) && !IsA(right_arg, CollateExpr) && !contain_var_clause(right_arg))


Please explain this if condition

These conditions check if it's safe to consider the column's collation for optimization:

!IsA(left_arg, CollateExpr): If left has explicit COLLATE (eg Col COLLATE Latin1_General_CI_AI LIKE 'abc'), op->inputcollid already reflects it. Without this check, we would overwrite it with column's default collation, ignoring the explicit COLLATE.

!IsA(right_arg, CollateExpr): If right has explicit COLLATE, evaluate_expr crashes on CollateExpr

!contain_var_clause(right_arg): If pattern has column references (eg Col LIKE OtherCol), it cannot be reduced to a Const, so optimization won't apply.

It seems like we are putting some hack in our code, maybe in some cases that is unavoidable but in most of the conditions I see that if not done then this crashes. I am not comfortable with such explaination. Is there anyway to have a generic solution?

Yes, agreed. I've implemented a better approach now which removes the need for all the manual guards that solely arose from use of evaluate_expr. Replaced it with PG's built-in eval_const_expressions + estimate_expression_value - these handle safety checks internally. Any unhandled expression simply skips optimization with correct results.

The conditions in this block are unrelated to index optimization. They are for collation recovery needed because views lose column collation during storage due to ::text casts. They ensure we only recover when collation wasn't explicitly set and don't unnecessarily override collation. I think the test coverage for views with LIKE pattern was not much earlier and hence we couldn't catch this issue.

ahmed-shameem · 2026-03-06T06:28:03Z

contrib/babelfishpg_tsql/src/collation.c

 	Pattern_Prefix_Status pstatus;
 	int			collidx_of_cs_as;
 	CollateExpr *prefix_collate;
+	Node	   *check_node; 


On a very high level, I would rather prefer to call: simplify_function directly because the current implementation relies heavily on handling everything in our end. We can create a wrapper function over simplify_function (as this is static) and call that directly. Check: pattern_fixed_prefix_wrapper

Tested simplify_function approach. It only handles FuncExpr nodes. Rightop for basic queries like LIKE N'exact match' is: RelabelType(CoerceToDomain(FuncExpr(...)))

simplify_function cannot process RelabelType or CoerceToDomain. All test cases fall back to Seq Scan then.

We use eval_const_expressions + evaluate_expr because:

eval_const_expressions handles immutable functions, OpExpr (e.g., string concatenation N'exact' + N' match'), nested type casts

evaluate_expr handles stable functions (e.g., CONVERT, TRY_CONVERT) that eval_const_expressions leaves

ahmed-shameem · 2026-03-06T06:38:16Z

test/JDBC/expected/like_op_index_scan-vu-prepare.out

@@ -0,0 +1,541 @@
+CREATE TABLE BABEL_5966_TestLikeIndex (


This will pick db level collation. For AI collations to pick index, we need to create index from PG endpoint using the following:

create index <fully-qualified-table-name> (sys.remove_accents_internal(col_name));

check : test_like_index-vu-prepare

I would suggest not to have this table as in github we have 2 wfs. One for server default collation (CI_AS) and db level collation (CI_AI)

If you create the index from the TSQL endpoint only, CI_AS will pick index and NOT CI_AI, as evident from the expected output.

Understood. Will remove BABEL_5966_TestLikeIndex and use only the explicit collation tables (CI_AS, CS_AS, CI_AI, CS_AI) which behave consistently across both workflows.

ahmed-shameem · 2026-03-06T06:38:47Z

test/JDBC/expected/like_op_index_scan-vu-prepare.out

+);
+GO
+
+CREATE INDEX BABEL_5966_idx_ci_as ON BABEL_5966_TestLikeCollation_CI_AS(Col);


Look at my comment on index creation for AI collations, these will never pick index scan

Agreed. Will remove AI indexes (idx_ci_ai, idx_cs_ai). Plain TSQL indexes can't match remove_accents_internal(col) used in AI queries.

Signed-off-by: Rucha Kulkarni <[email protected]>

…-5966

Signed-off-by: Rucha Kulkarni <[email protected]>

…-5966

Signed-off-by: Rucha Kulkarni <[email protected]>

Deepesh125 · 2026-03-21T13:01:16Z

contrib/babelfishpg_tsql/src/collation.c

- *		 col LIKE PATTERN -> col = PATTERN
+ * Case 1: if the pattern is a constant string
+ *		 col LIKE PATTERN -> col = PATTERN AND col LIKE PATTERN
+ *       col NOT LIKE PATTERN -> col <> PATTERN OR col NOT LIKE PATTERN


With this OR clause, planner may loose an ability to choose index scan. Hence in most of the cases where we have correctness issue will be fixed with this but perf might get degraded. But I think tht should be okay

Deepesh125 · 2026-03-21T13:18:05Z

contrib/babelfishpg_tsql/src/collation.c

+	root.type = T_PlannerInfo;
+	root.glob = &glob;
+
+	return estimate_expression_value(&root, expr);


comment over fold_like_pattern_to_const clearly says following,

* This function attempts to estimate the value of an expression for * planning purposes. It is in essence a more aggressive version of * eval_const_expressions(): we will perform constant reductions that are * not necessarily 100% safe, but are reasonable for estimation purposes.

In essence -- runtime value might be different from planned value. So I think we should avoid using it blindly. Instead we should only try to const fold CONVERT/TRY_CONVERT selectively.

btw, what reason CONVERT/TRY_CONVERT function are marked STABLE in Babelfish?

Rucha Kulkarni added 18 commits February 20, 2026 20:21

Fix to enable index scan with LIKE operator for non-const nodes

088e87f

Signed-off-by: Rucha Kulkarni <[email protected]>

Fix type reconciliation to always run regardless of RelabelType

21d6b84

Signed-off-by: Rucha Kulkarni <[email protected]>

Skipping Escapes from this optimization

868da18

Signed-off-by: Rucha Kulkarni <[email protected]>

Correcting the optimization logic and adding test files

f8cdd51

Signed-off-by: Rucha Kulkarni <[email protected]>

Fixing collation and removing test files from unnecessary schedules

86997fe

Signed-off-by: Rucha Kulkarni <[email protected]>

Fixing test failures

3357524

Signed-off-by: Rucha Kulkarni <[email protected]>

Removing like node check

ffdb176

Signed-off-by: Rucha Kulkarni <[email protected]>

Ficxing Collation

ac796d5

Signed-off-by: Rucha Kulkarni <[email protected]>

Testing to see if failures in views get resolved

75140af

Signed-off-by: Rucha Kulkarni <[email protected]>

Merge remote-tracking branch 'upstream/BABEL_6_X_DEV' into jira-babel…

cd87af9

…-5966

Removing changes made for handling dump restore and upgrade failures

69b694c

Signed-off-by: Rucha Kulkarni <[email protected]>

Fixing views after restore and modifying test files accordingly

71e35f2

Signed-off-by: Rucha Kulkarni <[email protected]>

Fixing crash

7917299

Signed-off-by: Rucha Kulkarni <[email protected]>

Fixing test failures

7f34556

Signed-off-by: Rucha Kulkarni <[email protected]>

Merge remote-tracking branch 'upstream/BABEL_6_X_DEV' into jira-babel…

926c335

…-5966

Empty commit to rerun workflow

2b17d45

Signed-off-by: Rucha Kulkarni <[email protected]>

Increasing sla

2f55c79

Signed-off-by: Rucha Kulkarni <[email protected]>

Merge remote-tracking branch 'upstream/BABEL_6_X_DEV' into jira-babel…

fdf6fca

…-5966

This comment was marked as off-topic.

Sign in to view

ahmed-shameem suggested changes Mar 6, 2026

View reviewed changes

Rucha Kulkarni added 9 commits March 9, 2026 17:36

Removing default collation and AI indexes from test files

afeb5b3

Signed-off-by: Rucha Kulkarni <[email protected]>

Merge remote-tracking branch 'upstream/BABEL_6_X_DEV' into jira-babel…

9822783

…-5966

15_2 Before file

5197005

Signed-off-by: Rucha Kulkarni <[email protected]>

Correcting before file

293001e

Signed-off-by: Rucha Kulkarni <[email protected]>

New Approach

aec3662

Signed-off-by: Rucha Kulkarni <[email protected]>

Merge remote-tracking branch 'upstream/BABEL_6_X_DEV' into jira-babel…

710ceec

…-5966

Adding function to check like_escape presence

4a8fd1d

Signed-off-by: Rucha Kulkarni <[email protected]>

Resolving test failures

c6d91e0

Signed-off-by: Rucha Kulkarni <[email protected]>

Resolving test failures

2021734

Signed-off-by: Rucha Kulkarni <[email protected]>

Deepesh125 requested changes Mar 21, 2026

View reviewed changes

Conversation

RuchaSK1 commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Issues Resolved

Test Scenarios Covered

Check List

Uh oh!

This comment was marked as off-topic.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

RuchaSK1 commented Feb 20, 2026 •

edited

Loading