Skip to content

Added ANTLR parse tree persistent caching for procedures#4547

Open
manisha-deshpande wants to merge 41 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
amazon-aurora:jira-babel-6037
Open

Added ANTLR parse tree persistent caching for procedures#4547
manisha-deshpande wants to merge 41 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
amazon-aurora:jira-babel-6037

Conversation

@manisha-deshpande
Copy link
Copy Markdown
Contributor

@manisha-deshpande manisha-deshpande commented Feb 6, 2026

Description

BABEL-6037: Cross-session ANTLR parse tree caching for T-SQL routines and triggers.

Routines (stored procedures, functions, triggers) with thousands of lines take excessive time on first execution in each new session due to redundant ANTLR parsing. The PLtsql function hash table that speeds up subsequent executions is session-scoped, so every new session re-parses from scratch.

This PR introduces persistent ANTLR parse tree caching. Serialized parse trees are stored in new columns in existing catalog table sys.babelfish_function_ext replicating PostgreSQL's nodeToString()/stringToNode() framework (refer: postgresql_modified_for_babelfish/src/backend/nodes/\* and babelfish_extensions/contrib/babelfishpg_tsql/src/pltsql_serialize/\*). On first execution in a new session, cached results are deserialized to skip ANTLR re-parsing, with babelfish version validation to prevent serving stale data.

Issues Resolved

BABEL-6037

Changes

Serialization Infrastructure (contrib/babelfishpg_tsql/src/pltsql_serialize/)

  • gen_pltsql_node_support.pl: Code generator (modeled after PG's gen_node_support.pl) produces pltsql_nodetags.h, pltsql_outfuncs_gen.c, pltsql_readfuncs_gen.c, pltsql_equalfuncs_gen.c and corresponding switch files from annotated PLtsql node definition headers (pltsql.h, pltsql-2.h) with pg_node_attr() annotations for serialization/deserialization control. Makefile invokes the generator with src/pltsql.h src/pltsql-2.h as input files.
  • Extension-owned T_PLtsql_* NodeTag values offset from PLTSQL_NODETAG_START (10000) to avoid collision with PG engine NodeTag enum, with ABI stability check (80 serializable node types + 2 nodetag-only types = 82 total, $last_nodetag_no = 10081)
  • pltsql_nodeio.c: Extension-side serialization/deserialization dispatch — pltsql_nodeToString()/pltsql_stringToNode() (duplicated from PG nodeToString/StringToNode functions) handle PLtsql nodes via generated switch files, and delegate base/PG types to PG's public outNode()/nodeRead()/parseNodeString() APIs. Uses pg_strtok_init() (engine-side addition) for tokenizer initialization
  • pltsql_serialize_macros.h: Replicates engine-internal WRITE_*/READ_*/COMPARE_* macros from outfuncs.c, readfuncs.c, equalfuncs.c. Redirects WRITE_NODE_FIELDpltsql_outNode() and READ_NODE_FIELDpltsql_nodeRead() so generated code routes through extension dispatch
  • pltsql_outfuncs_stubs.c / pltsql_readfuncs_stubs.c: Custom read/write handlers for nodes requiring special serialization logic not handled by auto-generated switch files (flexible array members, runtime-only fields, string/int arrays, unions):
    • PLtsql_expr — skips runtime-only fields (plan, func, expr_simple_*)
    • PLtsql_nsitem — handles FLEXIBLE_ARRAY_MEMBER for name[]
    • PLtsql_row — handles string/int arrays (fieldnames, varnos)
    • PLtsql_recfield — skips runtime cache fields (rectupledescid, finfo)
    • PLtsql_stmt_dbcc — handles union (PLtsql_dbcc_stmt_data) keyed by dbcc_stmt_type
  • pltsql_equalfuncs.c: Parse tree comparison infrastructure for the validate_antlr_parse_cache debug GUC, with custom equality stubs for PLtsql_row, PLtsql_recfield, PLtsql_nsitem, PLtsql_stmt_dbcc

Catalog Changes (sys.babelfish_function_ext)

Four new columns:

  • antlr_parse_tree_text TEXT — serialized parse tree (nodeToString(pltsql_parse_result) output)
  • antlr_parse_tree_datums TEXT — serialized datum array (nodeToString(pltsql_Datums) output)
  • antlr_parse_tree_bbf_version TEXT — Babelfish version at serialization time
  • antlr_parse_cache_enabled BIT — per-function cache control (NULL=follow session GUC (default), true=force on, false=force off/kill switch). Updated by sys.enable_antlr_parse_cache(OID, BOOLEAN)

Upgrade SQL in babelfishpg_tsql--5.5.0--5.6.0.sql adds these columns.

Cache Lifecycle (hooks.c, pl_comp.c)

  • CREATE/ALTER: Compiles routine, serializes parse tree to babelfish_function_ext (pltsql_store_func_default_positions()pltsql_fill_antlr_parse_cache_columns()). Existing flow subsequently populates the in-session PLtsql hash table for same session executions.
  • 1st EXEC in new session: Session hash table miss → pltsql_restore_antlr_parse_cache_result() looks up function/column entry in bbf_func_ext catalog and deserializes retrieved cache result, validates bbf_version, populates hash table
  • 2nd+ EXEC in same session: Hash table hit (no deserialization)
  • EXEC after cache miss: pltsql_update_func_antlr_parse_cache() re-serializes fresh ANTLR result to catalog (also handles triggers, which are cached at first execution rather than CREATE time)
  • ALTER with GUC disabled: Sets parse tree columns to NULL
  • Kill switch (sys.enable_antlr_parse_cache(oid, false)): Sets antlr_parse_cache_enabled=false, NULLs cache columns, prevents future cache reads/writes for that function regardless of session GUC
  • Error handling: Serialization/deserialization errors propagate via PG_RE_THROW() with elog(LOG) for diagnostics
  • TDS connection check: All cache code paths (pltsql_fill_antlr_parse_cache_columns, pltsql_restore_antlr_parse_cache_result, pltsql_update_func_antlr_parse_cache) check IS_TDS_CONN() — cache is only active for TDS connections, not psql/libpq
  • m/MVU (minor/Major Version Upgrade): Version mismatch rejects cache, ANTLR re-parses, cache re-populated on next exec

GUC Configuration (guc.c)

  • babelfishpg_tsql.enable_antlr_parse_cache (session-level, default true) — global toggle for enabling/disabling cache reads and writes. NOT available via sp_babelfish_configure 'enable_antlr_parse_cache', 'on'/'off' for cluster-wide configuration, currently.
  • sys.enable_antlr_parse_cache(routine_id OID, use_antlr_parse_cache BOOLEAN) — per-function cache control; accepts function OID (via sys.object_id('schema.func')). true=force on, false=force off (kill switch, NULLs cache columns), NULL=reset to default (follow session GUC). Requires function ownership or sysadmin role.
  • sys.antlr_parse_cache_stats() — returns session-level cache statistics: cache_hits (confirmed cache reuse), cache_misses (cache enabled but empty/stale), cache_writes (successful serialization), cache_evictions (hash table eviction, kill switch, invalidation), cache_errors (serialization/deserialization failures). Counters reset per backend session.
  • babelfishpg_tsql.validate_antlr_parse_cache (session-level, default false) — debug GUC that compares cached deserialized trees against freshly ANTLR-compiled trees at CREATE/ALTER and EXEC time, logs PASS/FAIL with function name in postgres logfile.
1. Session GUC
SELECT set_config('babelfishpg_tsql.enable_antlr_parse_cache', 'on', false);
GO

2. Per-function cache control across sessions (OID-based)
-- Enable for specific function
SELECT sys.enable_antlr_parse_cache(sys.object_id('dbo.my_proc'), true);
GO
-- Disable (kill switch — NULLs cache columns, overrides session GUC)
SELECT sys.enable_antlr_parse_cache(sys.object_id('dbo.my_proc'), false);
GO
-- Reset to default (follow session GUC)
SELECT sys.enable_antlr_parse_cache(sys.object_id('dbo.my_proc'), NULL);
GO

3. Cache statistics
SELECT * FROM sys.antlr_parse_cache_stats();
GO
cache_hits  |  cache_misses  |  cache_writes  |  cache_evictions  |  cache_errors

4. Validation GUC
SELECT set_config('babelfishpg_tsql.validate_antlr_parse_cache', 'on', false);
GO

Node Allocation (pltsql.h, pltsql-2.h, pl_gram.c, tsqlIface.cpp)

  • Added NodeTag type as the first field to all PLtsql statement and datum structs in pltsql.h and pltsql-2.h. This is required by PG's makeNode()/nodeTag() infrastructure which the serialization framework depends on. The existing cmd_type/dtype fields could not be reused because not all structs have them.
  • Added pg_node_attr() annotations directly to pltsql.h and pltsql-2.h for serialization control:
    • read_as() / write_as(): Type conversion for serialization (e.g., read_as(PLtsql_expr*) for PLtsql_expr*[] arrays)
    • array_size(): Array length specification for pointer fields (e.g., array_size(nfields) for PLtsql_row.fieldnames)
    • equal_ignore: Skip field in equality comparison (e.g., lineno fields, runtime-only fields)
    • read_write_ignore: Skip field in serialization/deserialization (e.g., runtime caches, function pointers)
    • copy_as(): Custom copy behavior for special fields
  • Consolidated serialization annotations into main headers (eliminated separate pltsql_serializable_1.h and pltsql_serializable_2.h files)
  • Named previously anonymous typedef structs in pltsql-2.h: PLtsql_stmt_print, PLtsql_stmt_kill, PLtsql_stmt_init, tsql_exec_param
  • Changed int32_tint32 in PLtsql_stmt_goto.target_pc and PLtsql_stmt_save_ctx.target_pc for PostgreSQL convention consistency (both types supported in @scalar_types)
  • InlineCodeBlockArgs and PLtsql_function added to @nodetag_only list in Perl script — receive NodeTags but no serialization functions (not needed for cache)
  • All PLtsql statement/datum struct palloc0() calls replaced with makeNode() in pl_gram.c and tsqlIface.cpp to set proper NodeTag values
  • tsqlIface.cpp: Fixed type confusion in exitExecute_body_batch — system procedures (sp_executesql etc.) return PLtsql_stmt_exec_sp, not PLtsql_stmt_exec; added cmd_type check before casting

PLtsql_function Struct Changes (pltsql.h)

  • bbf_ext_xmin / bbf_ext_tid: Track the babelfish_function_ext tuple identity at compile time for cross-session cache invalidation (detects concurrent ALTER)
  • from_cache: Flag indicating the function was loaded from persistent cache (skips re-serialization at EXEC time to avoid redundant catalog writes)

Documentation

  • serialization-annotations-guide.md: Comprehensive reference for all pg_node_attr() annotations used in pltsql.h and pltsql-2.h, explaining the purpose and usage of each attribute type (read_as, write_as, array_size, equal_ignore, read_write_ignore, copy_as) with examples from the codebase. Includes rationale for design decisions (e.g., int32 vs int32_t type choice, lineno field handling, nodetag-only structs).

Performance Results

  • Customer procedure (~1300 lines): 2031ms → 15ms first-execution time (99% reduction)
  • Stress test (1000 connections): Average first-execution time ~4314ms → ~175ms

Test Scenarios Covered

  • Use case based - (BABEL-6037-vu-prepare.sql, BABEL-6037-vu-verify.mix)

    • Test 1: Simple procedure without arguments — basic serialization
    • Test 2: Procedure with arguments — parameter handling and variable serialization
    • Test 3: Procedure with various statement types (WHILE, GOTO, TRY-CATCH, CASE, BREAK, CONTINUE, PRINT, RETURN)
    • Test 4: Procedure containing EXEC sp_executesql — serialization of system procedure call nodes
    • Test 5: Complex procedure — nested blocks, IF/ELSE, UNPIVOT, temp tables, RETURN value
    • Test 11: Procedure with single OUT parameter (out_param_varno re-derivation)
    • Test 12: Procedure with multiple OUT parameters (PLtsql_row datum preservation)
    • Test 13: Multi-statement table-valued function (MSTVF out_param_varno)
    • Test 13b: Inline table-valued function (ITVF itvf_query serialization)
    • Test 22: Parse cache validation GUC — cached vs ANTLR tree comparison at CREATE and EXEC
    • Test 29: DBCC CHECKIDENT procedure — custom_read_write serialization of union-based node
    • Test 30: Trigger function — cached at first execution (not CREATE time)
  • Boundary conditions -

    • Cache lifecycle:
      • Test 6: Same-session CREATE → EXEC → ALTER → DROP lifecycle (samesession_proc)
      • Test 7: Cross-session procedure — EXEC, ALTER, DROP in new session (oldsession_proc)
      • Test 8: Rename procedure with GUC off, then EXEC with GUC on — cache invalidation
      • Test 9: ALTER with GUC off (cache NULLed), then EXEC with GUC on (cache re-populated)
      • Test 10: Altered dependency — drop/recreate table used by cached procedure
      • Test 23: Cache populated at EXEC time (created with GUC off, executed with GUC on)
    • Per-function cache control:
      • Test 14: Per-function cache enable/disable with OID (sys.object_id('dbo.func'))
      • Test 15: Mid-session per-func toggle (true→false→true) with hash table invalidation
      • Test 16: Default behavior (antlr_parse_cache_enabled = NULL, follows session GUC)
      • Test 17: NULL reset — explicit NULL via API, verify follows GUC
      • Test 17b: Per-func true — force cache ON even when session GUC OFF
      • Test 17c: Kill switch — per-func false blocks caching when session GUC ON
      • Test 18: Invalid OID handling, NULL OID, ownership check (non-owner denied)
      • Test 19: ALTER preserves per-function cache flag
      • Test 20: DROP removes per-function cache flag (babelfish_function_ext row gone)
    • Cache statistics:
      • sys.antlr_parse_cache_stats() calls throughout verify to track hits, misses, writes, evictions, errors progression across all test scenarios
    • Configuration:
      • Test 28: Version mismatch — cached tree with higher bbf_version, re-parses and re-caches
  • Arbitrary inputs -

    • Test 24: Overloaded procs in different schemas — independent caching and per-function GUC with full signature
    • Test 25: Nested EXEC — outer cached proc calls inner cached proc, both cached independently
  • Negative test cases -

    • Test 18: Invalid OID handling (non-existent OID, NULL OID)
    • Test 18b: Ownership check — non-owner denied
    • Test 24: Updating GUC with NULL OID value fails
    • Test 26: Altered/renamed dependency — ALTER TABLE, sp_rename column, sp_rename table
    • Test 27: Corrupt cache deserialization — error propagated via PG_RE_THROW, not silently swallowed
  • Minor version upgrade tests -

    • Expected output updates for babelfish_function_ext schema change (BABEL-2877 upgrade cleanup files)
  • Major version upgrade tests -

  • Performance tests -

  • Tooling impact -

  • Client tests -

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Comment thread contrib/babelfishpg_tsql/sql/ownership.sql Outdated
Comment thread contrib/babelfishpg_tsql/src/pltsql_serialize_jsonb.h Outdated
Comment thread contrib/babelfishpg_tsql/src/pltsql_deserialize.c Outdated
Copy link
Copy Markdown
Contributor

@robverschoor robverschoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addded some comments

Comment thread contrib/babelfishpg_tsql/sql/upgrades/babelfishpg_tsql--5.5.0--5.6.0.sql Outdated
Comment thread contrib/babelfishpg_tsql/sql/upgrades/babelfishpg_tsql--5.5.0--5.6.0.sql Outdated
Comment thread contrib/babelfishpg_tsql/src/guc.c Outdated
@manisha-deshpande manisha-deshpande force-pushed the jira-babel-6037 branch 2 times, most recently from ae3ac48 to dcf6cb9 Compare March 29, 2026 22:04
{
int tag = (int) nodeTag(obj);

return (tag >= (int) T_PLtsql_type &&
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be replaced by a defined constant 1000 representing where PLTsql tag numbers start from (refer $pltsql_nodetag_start in contrib/babelfishpg_tsql/src/pltsql_serialize/gen_pltsql_node_support.pl).
PG nodes (nodetags.h) currently ends at ~400 (https://github.com/babelfish-for-postgresql/postgresql_modified_for_babelfish/blob/492fb8f567a64a0b271af73562451c31a342bba2/src/backend/nodes/gen_node_support.pl#L110)

@manisha-deshpande
Copy link
Copy Markdown
Contributor Author

manisha-deshpande commented Apr 2, 2026

Tests / JDBC Tests / run-babelfish-jdbc-tests (pull_request) test fails for testfiles:

  • BABEL-3092 - Failed
  • BABEL-662 - Failed
  • Test-scope-identity - Failed
  • babelfish_function_ext-vu-cleanup - Failed

Seems to be some issue specifically with sp_executesql. Investigating...

Edit: Fixed (refer change in tsqlface.cpp)

Issue details:
The sp_executesql failure was caused by a type confusion in exitExecute_body_batch (tsqlIface.cpp). System procedures like sp_executesql return PLtsql_stmt_exec_sp from makeSpStatement(), but the exit handler unconditionally cast to PLtsql_stmt_exec and accessed its expr field. Adding NodeTag as the first field of all PLtsql statement structs (for parse cache serialization) shifted field offsets, causing the cast to read NULL where it previously read harmless garbage. Fixed by checking cmd_type == PLTSQL_STMT_EXEC before casting.

Test failure:

1> sp_executesql N'SELECT 1'
2> go
Msg 33557097, Level 16, State 1, Server BABELFISH, Line 1
can't mutate an internal query. NULL expression

After fix:

1> sp_executesql N'SELECT 1'
2> go
           
-----------
          1

(1 rows affected)

Comment thread contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_outfuncs_stubs.c Outdated
Comment thread contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_compare.c Outdated
SET allow_system_table_mods = on;
ALTER TABLE sys.babelfish_function_ext ADD COLUMN IF NOT EXISTS antlr_parse_tree_text TEXT DEFAULT NULL;
ALTER TABLE sys.babelfish_function_ext ADD COLUMN IF NOT EXISTS antlr_parse_tree_datums TEXT DEFAULT NULL;
ALTER TABLE sys.babelfish_function_ext ADD COLUMN IF NOT EXISTS antlr_parse_tree_modify_date SYS.DATETIME DEFAULT NULL;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still using modify data ? I recollect NULLING out the other column during ALTER FUNCTION/PROC instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Investigated all code paths that modify babelfish_function_ext:

  • ALTER PROC (GUC on): pltsql_store_func_default_positionspltsql_fill_cache_columns re-populates all cache columns with new tree + new antlr_parse_tree_modify_date. Both timestamps match.
  • ALTER PROC (GUC off): pltsql_fill_cache_columns(NULL, ...) → NULLs all cache columns including antlr_parse_tree_text. Caught by the antlr_parse_tree_text IS NULL check before the modify_date check is reached.
  • SP RENAME proc: Rename path in catalog.c NULLs all cache columns.
  • EXEC-time re-population (pltsql_update_func_cache_entry): Sets antlr_parse_tree_modify_date to current timestamp, which is always >= modify_date.
  • Per-function disable (enable_routine_parse_cache(func, false)): NULLs cache text/datums/modify_date/bbf_version.

In all current code paths, cache columns are either atomically re-populated (all set together) or atomically NULLed (all set to NULL together). So modiy_date, check does seem redundant in this case.
A different check bbf_version detects version change and will be retained.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are we removing the modify data column column ?

Comment thread contrib/babelfishpg_tsql/src/pltsql_node/gen_pltsql_node_support.pl
Comment thread contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_equalfuncs.c Outdated
Comment thread contrib/babelfishpg_tsql/src/hooks.c
Comment thread contrib/babelfishpg_tsql/src/catalog.c Outdated
Comment thread contrib/babelfishpg_tsql/src/guc.c
Comment thread contrib/babelfishpg_tsql/src/hooks.c Outdated
Comment thread contrib/babelfishpg_tsql/src/hooks.c
@coveralls
Copy link
Copy Markdown
Collaborator

Coverage Report for CI Build 24360332184

Coverage decreased (-0.09%) to 77.04%

Details

  • Coverage decreased (-0.09%) from the base build.
  • Patch coverage: 320 uncovered changes across 11 files (584 of 904 lines covered, 64.6%).
  • No coverage regressions found.

Uncovered Changes

Top 10 Files by Coverage Impact Changed Covered %
contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_nodeio.c 170 87 51.18%
contrib/babelfishpg_tsql/src/pl_gram.y 74 0 0.0%
contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_equalfuncs.c 69 10 14.49%
contrib/babelfishpg_tsql/src/hooks.c 214 173 80.84%
contrib/babelfishpg_tsql/src/pl_comp.c 115 84 73.04%
contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_readfuncs_stubs.c 55 46 83.64%
contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_outfuncs_stubs.c 36 28 77.78%
contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_serialize_macros.h 28 21 75.0%
contrib/babelfishpg_tsql/src/pl_exec.c 6 2 33.33%
contrib/babelfishpg_tsql/src/tsqlNodes.c 3 0 0.0%

Coverage Regressions

No coverage regressions found.


Coverage Stats

Coverage Status
Relevant Lines: 69817
Covered Lines: 53787
Line Coverage: 77.04%
Coverage Strength: 404165.81 hits per line

💛 - Coveralls

SET allow_system_table_mods = on;
ALTER TABLE sys.babelfish_function_ext ADD COLUMN IF NOT EXISTS antlr_parse_tree_text TEXT DEFAULT NULL;
ALTER TABLE sys.babelfish_function_ext ADD COLUMN IF NOT EXISTS antlr_parse_tree_datums TEXT DEFAULT NULL;
ALTER TABLE sys.babelfish_function_ext ADD COLUMN IF NOT EXISTS antlr_parse_tree_modify_date SYS.DATETIME DEFAULT NULL;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are we removing the modify data column column ?

Comment thread contrib/babelfishpg_tsql/sql/upgrades/babelfishpg_tsql--5.5.0--5.6.0.sql Outdated
Comment thread contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_equalfuncs.c Outdated
Comment on lines +37 to +38
if (a == NULL && b == NULL) return true;
if (a == NULL || b == NULL) return false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why is this needed inside the custom implementation ? The equal to recursive function should have this check instead ?
  2. Why is this custom implementation requried ? To skip comparing some fields ? Can we use equal_ignore instead.

Comment thread contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_equalfuncs.c Outdated
Comment thread contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_nodetags.h Outdated
Comment thread contrib/babelfishpg_tsql/src/catalog.c Outdated
Comment thread contrib/babelfishpg_tsql/src/guc.c
Comment thread contrib/babelfishpg_tsql/src/guc.c
Comment thread contrib/babelfishpg_tsql/src/hooks.c
@manisha-deshpande manisha-deshpande marked this pull request as ready for review April 15, 2026 02:10
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
…ble and type

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
…E, TRY Statements

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
…d retrieval

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Move PLtsql node deserialization from PG engine's parseNodeString_hook
to extension-side pltsql_stringToNode/pltsql_nodeRead/pltsql_parseNodeString
in pltsql_nodeio.c.

Changes:
- pltsql_nodeio.c: pltsql_stringToNode() sets tokenizer via pg_strtok_init(),
  pltsql_nodeRead() classifies tokens and dispatches, pltsql_parseNodeString()
  tries PLtsql switch then falls back to PG parseNodeString() via pushback.
- pltsql_serialize_macros.h: READ_NODE_FIELD redirected to pltsql_nodeRead().
- pltsql_node_stubs.c: nodeRead() → pltsql_nodeRead() in _read* stubs.
- hooks.c: stringToNode() → pltsql_stringToNode() at 3 call sites.
- pltsql_readfuncs.c: Stubbed out (code moved to pltsql_nodeio.c).
- pl_handler.c: Removed parseNodeString_hook assignment from _PG_init().

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Move PLtsql node serialization from PG engine's outNode_hook to
extension-side pltsql_nodeToString/pltsql_outNode/pltsql_outList
in pltsql_nodeio.c.

Changes:
- pltsql_nodeio.c: pltsql_nodeToString() as public entry point,
  pltsql_outNode() dispatches PLtsql nodes via generated switch and
  delegates PG nodes to outNode(), pltsql_outList() walks elements.
- pltsql_serialize_macros.h: WRITE_NODE_FIELD redirected to pltsql_outNode().
- pltsql_node_stubs.c: outNode() → pltsql_outNode() in _out* stubs.
- hooks.c: nodeToString() → pltsql_nodeToString() at 2 call sites.
- pltsql_outfuncs.c: Stubbed out (code moved to pltsql_nodeio.c).
- pl_handler.c: Removed outNode_hook assignment, updated extern declarations.
- Makefile: Replaced pltsql_outfuncs.o/pltsql_readfuncs.o with pltsql_nodeio.o.

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
…date expected test output

Changes:
- tsqlIface.cpp: Fix exitExecute_body_batch to check cmd_type before
  casting to PLtsql_stmt_exec. System procedures (sp_executesql etc.)
  return PLtsql_stmt_exec_sp which has a different layout; the blind
  cast read NULL from the wrong offset after NodeTag was added.
- gen_pltsql_node_support.pl: Emit PLTSQL_NODETAG_START define in
  pltsql_nodetags.h.
- pltsql_nodeio.c: is_pltsql_node() uses PLTSQL_NODETAG_START instead
  of hardcoded T_PLtsql_type..T_PLtsql_stmt_restore_ctx_partial.
- pltsql_serialize_macros.h: pltsql_equal_nodes_or_equal() uses
  PLTSQL_NODETAG_START instead of hardcoded 1000..1079.
- babelfish_function_ext-vu-cleanup.out: Update expected output for
  new antlr_parse_tree columns in sys.babelfish_function_ext.

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Add XactReadOnly guard to pltsql_update_func_cache_entry so EXEC-time
cache population is silently skipped on read-only nodes (e.g., Aurora
reader instances). The cache
gets populated when the same function runs on a writer node.

The other two cache-write paths do not need this guard:
- pltsql_store_func_default_positions: runs during CREATE/ALTER which
  is already blocked by PG's read-only transaction check.
- update_bbf_function_cache_enabled (sys.enable_routine_parse_cache):
  is an explicit admin action that should surface PG's read-only error
  so the user knows to run it on the writer node.

Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Changes:
- Test 24: Overloaded procedures in different schemas — verifies
  independent caching and per-function GUC targeting with full signature
- Test 25: Nested EXEC — outer cached proc calls inner cached proc,
  both cached independently
- Test 26: Altered/renamed dependency — ALTER TABLE (add column),
  sp_rename column, sp_rename table, verifies cache behavior when
  underlying table structure changes
- Convert verify to .mix for cross-session testing (hash table empty)
- Fix enable_routine_parse_cache: skip dots inside parenthesized
  signatures when splitting schema from function name
- Add NULL argument check to enable_routine_parse_cache

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Changes:
- Split pltsql_node_stubs.c into pltsql_outfuncs_stubs.c / pltsql_readfuncs_stubs.c
- Merged pltsql_compare.c into pltsql_equalfuncs.c (single equality file)
- Created pltsql_serialize.h as public API header, removed scattered extern declarations
- Removed bare-brace-without-statement blocks in hooks.c, pl_comp.c, readfuncs_stubs.c
- Changed pltsql_compare_parse_trees NULL check from DEBUG1 to PANIC
- Added clarifying comments for serializable_1.h vs _2.h, equalfuncs.c
- Replaced no_copy_equal with no_copy in serializable headers, removed @no_equal ignore hack
- Added equal_ignore on target_pc (runtime-resolved, differs between CREATE/EXEC)
- Consolidated pltsql_nodeio.c includes, removed duplicate forward declarations

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Changes:
- Replace custom_read_write on PLtsql_expr with read_write_ignore annotations;
  _out/_read/_equal for PLtsql_expr are now fully generated by Perl script
- Replace no_copy_equal with no_copy in serializable headers; equal functions
  generated through normal path instead of @no_equal ignore hack
- Add equal_ignore on target_pc (runtime-resolved, differs CREATE vs EXEC)
- Remove NULL checks from _out* stubs (handled by pltsql_outNode dispatcher)
- Move datum comparison to pltsql_compare_datum_arrays() with two-pointer walk
  that correctly identifies duplicate ANTLR datums vs genuine mismatches

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Changes:
Fix hash table duplicate insert WARNING caused by stale from_cache entries:
- pltsql_HashTableLookup: evict from_cache=true entry when session GUC is off
- pltsql_compile: goto invalidate_function for bbf_ext changes on cached functions
- do_compile: clear bbf_ext_xmin on cache miss to prevent stale values

Replace FlushErrorState() with PG_RE_THROW() in all 5 PG_CATCH blocks in
pltsql_fill_cache_columns and pltsql_restore_func_parse_result. Errors now
propagate instead of being silently swallowed, following the established
babelfishpg_tsql pattern. Added elog(LOG) before each re-throw for diagnostics.

Add corrupt_cache_test_func test case (Test 27) to verify deserialization
errors surface correctly when cache is corrupted.

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Redesign antlr_cache_enabled column and sys.enable_routine_parse_cache API
to support three-value semantics and OID-based input.

Changes:
- antlr_cache_enabled column: NULL=follow session GUC (default),
  true=force cache on, false=force cache off (kill switch)
- Change column DEFAULT from false to NULL; migrate existing false
  values to NULL in upgrade SQL
- Change sys.enable_routine_parse_cache input from TEXT to OID;
  callers use sys.object_id('schema.func') for lookup
- Add ownership check: caller must own function or be sysadmin
- Update combined check logic in hooks.c and pl_comp.c to resolve
  cache_enabled column value
- Update test cases to lookup functions with oid, guc kill switch, null
  reset, function specific enable cache, mid-session toggle with hash
table invalidation

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
After changing guc function signature, update dependency file to fix
dump restore/version upgrade test failures.

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Changes:
- Add sys.routine_parse_cache_stats() returning cache_hits, cache_misses,
  cache_writes, cache_evictions, cache_errors per backend session
- Hits: counted on confirmed cache reuse in pltsql_compile (from_cache
  valid or bbf_ext_xmin valid) and do_compile cache restore
- Misses: counted only when caching was enabled but cache was empty/stale
- Writes: counted in pltsql_fill_cache_columns on successful serialization
- Evictions: counted at invalidate_function for from_cache entries, GUC-off
  hash table eviction, and explicit kill switch (cache columns NULLed)
- Errors: counted in PG_CATCH blocks before PG_RE_THROW
- Tests: stats calls throughout verify file to track counter progression

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Comment thread contrib/babelfishpg_tsql/src/hooks.c Outdated
* Attempt to restore a cached ANTLR parse tree from babelfish_function_ext.
*
* Called from do_compile before ANTLR parsing. If a valid cache entry exists in
* bablfish_function_ext, returns the deserialized parse tree and datums, allowing the caller
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Comment thread .gitignore Outdated
contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_outfuncs_switch.c
contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_readfuncs_gen.c
contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_readfuncs_switch.c
contrib/babelfishpg_tsql/src/pltsql_serialize/pltsql_equalfuncs_gen.c
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dir is changed to pltsql_node, so need to change this.

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants