Skip to content

Added ANTLR parse tree persistent caching for procedures#4547

Draft
manisha-deshpande wants to merge 10 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
amazon-aurora:jira-babel-6037
Draft

Added ANTLR parse tree persistent caching for procedures#4547
manisha-deshpande wants to merge 10 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
amazon-aurora:jira-babel-6037

Conversation

@manisha-deshpande
Copy link
Contributor

Description

[TBD]

Issues Resolved

BABEL-6037

Test Scenarios Covered

[TBD]

  • Use case based -

  • Boundary conditions -

  • Arbitrary inputs -

  • Negative test cases -

  • Minor version upgrade tests -

  • Major version upgrade tests -

  • Performance tests -

  • Tooling impact -

  • Client tests -

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

create_date SYS.DATETIME NOT NULL,
modify_date SYS.DATETIME NOT NULL,
definition sys.NTEXT DEFAULT NULL,
antlr_parse_tree JSONB DEFAULT NULL, -- JSONB serialized ANTLR parse tree for caching
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

babelfish_function_ext regression test fails.
Column can be queried from postgres side, not babelfish side due to unsupported datatype.
Should the column type be TEXT instead? Would that affect storage space?

--- /home/runner/work/babelfish_extensions/babelfish_extensions/test/JDBC/./expected/babelfish_function_ext-vu-cleanup.out	2026-02-06 18:16:42.761144533 +0000
+++ /home/runner/work/babelfish_extensions/babelfish_extensions/test/JDBC/./output/babelfish_function_ext-vu-cleanup.out	2026-02-06 18:42:00.808212640 +0000
@@ -52,7 +52,7 @@
 -- babelfish_function_ext entry should have been removed after dropping all these functions/procedure
 SELECT * FROM sys.babelfish_function_ext WHERE funcname LIKE 'babel_2877_vu_prepare%';
 GO
-~~START~~
-varchar#!#varchar#!#nvarchar#!#text#!#text#!#bigint#!#bigint#!#datetime#!#datetime#!#ntext
-~~END~~
+~~ERROR (Code: 33557097)~~
+
+~~ERROR (Message: data type jsonb is not supported yet)~~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can ignore this error for now. We can later add a TDS sender function for JSONB (which just sends it as JSON)

Should the column type be TEXT instead? Would that affect storage space?

Yes, JSONB will allow fast lookups compared to JSON/TEXT which will required deserialization of its own. (Which will become a problem for bigger procedures).

*
* This header provides the interface for serializing and deserializing
* ANTLR PLtsql parse trees to/from JSONB format. The serialized data is
* stored in the cross-session cache (babelfish_func_ext) to enable faster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a catalog rather than a cache - even though the catalog will be cached


/* Read the value for this key */
tok = JsonbIteratorNext(&ctx.it, &v, false);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why overwrite tok ?

Copy link
Contributor

@robverschoor robverschoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addded some comments

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <[email protected]>
…ble and type

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <[email protected]>
…E, TRY Statements

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <[email protected]>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <[email protected]>
…d retrieval

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <[email protected]>
Add cross-session ANTLR parse tree caching for T-SQL stored procedures.
Serialized parse trees and datums are stored in babelfish_function_ext
using nodeToString/stringToNode. On procedure execution, cached results
are restored to skip ANTLR re-parsing. Cache reads validate the stored
bbf_version and modify_date before deserializing, skipping stale entries
from different Babelfish versions or procedures modified with the GUC
disabled.

Changes:
- Add antlr_parse_tree_text, antlr_parse_tree_datums,
  antlr_parse_tree_modify_date, and antlr_parse_tree_bbf_version columns
to sys.babelfish_function_ext
- Store serialized parse tree and version in
  pltsql_store_func_default_positions
- Restore and validate cached parse tree in new function
  pltsql_restore_func_parse_result invoked prior to ANTLR parse

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <[email protected]>
Comment on lines +235 to +237
DROP TABLE IF EXISTS sys.babelfish_function_ext;

CREATE TABLE sys.babelfish_function_ext (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be ALTER TABLE ADD COLUMN

Comment on lines +248 to +251
antlr_parse_tree_text TEXT DEFAULT NULL, -- Native PG nodeToString() serialized parse tree
antlr_parse_tree_datums TEXT DEFAULT NULL, -- Native PG nodeToString() serialized datums array
antlr_parse_tree_modify_date SYS.DATETIME DEFAULT NULL,
antlr_parse_tree_bbf_version TEXT DEFAULT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two colums should be sufficent, no ?

  1. pg_node_tree - the tree
  2. bbf_version_for_tree

Also why will storing it in binary have performnace improvments ? Wouldn't it have the reverse effect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've let it remain as two separate columns for now as the split by delimiter function call on large concatenated strings (for large procedures) might be expensive (unnecessary overhead)


bool pltsql_enable_create_alter_view_from_pg = false;
bool pltsql_enable_alter_owner_from_pg = false;
bool pltsql_enable_procedure_parse_cache = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use the general term routine applies to both procedures and functions.

Add bbf_version validation, exec-time cache repopulation, and
rename/alter/dependency invalidation logic and tests

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <[email protected]>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants