Enable SingleQE join with SegmentGeneralWorkers by avamingli · Pull Request #5 · avamingli/cloudberrydb

avamingli · 2023-12-08T04:08:13Z

For a parallel join, we may benefit if gather SegmentGeneralWorkers to SingleQE.

Gather(SegmentGeneralWorkers) join SingleQE, return join locus: SingleQE. We may win if we are a parallel-aware join, SingleQE is on the inner side that means there is a chance to generate a parallel join under SingleQE. In this case, we have both side parallel and may benefit. See ex 5_P_2_2 in cbdb_parallel.sql

begin;
set local enable_parallel = on;
set local max_parallel_workers_per_gather = 4;
create table t1(a int, b int) with(parallel_workers=4);
create table t2(a int, b int) with(parallel_workers=4);
create table rt1(a int, b int) with(parallel_workers=4) distributed replicated;
insert into t1 select i, i from generate_series(1, 10000000) i;
insert into t2 select i, i from generate_series(1, 10000000) i;
insert into rt1 select i, i+1 from generate_series(1, 10000) i;
analyze t1;
analyze t2;
analyze rt1;
explain(costs off, locus) select * from rt1  join (select count(*) as c, sum(t1.a) as a  from t1 join t2 using(a)) t3 on t3.c = rt1.a;
                            QUERY PLAN                             
-------------------------------------------------------------------
 Parallel Hash Join
   Locus: Entry
   Hash Cond: (rt1.a = (count(*)))
   ->  Gather Motion 4:1  (slice1; segments: 4)
         Locus: Entry
         ->  Parallel Seq Scan on rt1
               Locus: SegmentGeneralWorkers
               Parallel Workers: 4
   ->  Parallel Hash
         Locus: Entry
         ->  Finalize Aggregate
               Locus: Entry
               ->  Gather Motion 12:1  (slice2; segments: 12)
                     Locus: Entry
                     ->  Partial Aggregate
                           Locus: HashedWorkers
                           Parallel Workers: 4
                           ->  Parallel Hash Join
                                 Locus: HashedWorkers
                                 Parallel Workers: 4
                                 Hash Cond: (t1.a = t2.a)
                                 ->  Parallel Seq Scan on t1
                                       Locus: HashedWorkers
                                       Parallel Workers: 4
                                 ->  Parallel Hash
                                       Locus: Hashed
                                       ->  Parallel Seq Scan on t2
                                             Locus: HashedWorkers
                                             Parallel Workers: 4
 Optimizer: Postgres query optimizer
(30 rows)
abort;

If not parallel-aware, we are not sure for the benefit and a simgle test shows lower performance, ex: parallel scan on replicated table and join with SingleQE which is a non-parallel plan.

SingleQE join Gather(SegmentGeneralWorkers), return join locus: SingleQE. We may win if gather to SingleQE no matter what parallel-aware is. SingleQE is outer side, there could be a parallel plan under it. So we may benefit even without a shared hash table. Let the planner decide.
See ex 2_P_5_2 in cbdb_parallel.sql

begin;
create table t1(a int, b int) with(parallel_workers=2);
create table rt1(a int, b int) with(parallel_workers=2) distributed replicated;
insert into t1 select i, i from generate_series(1, 100000) i;
insert into rt1 select i, i+1 from generate_series(1, 10000) i;
analyze t1;
analyze rt1;
set local enable_parallel = on;
explain(locus, costs off) select * from (select count(*) as a from t1) t1 left join rt1  on rt1.a = t1.a;
                      QUERY PLAN                      
------------------------------------------------------
 Parallel Hash Left Join
   Locus: Entry
   Hash Cond: ((count(*)) = rt1.a)
   ->  Finalize Aggregate
         Locus: Entry
         ->  Gather Motion 6:1  (slice1; segments: 6)
               Locus: Entry
               ->  Partial Aggregate
                     Locus: HashedWorkers
                     Parallel Workers: 2
                     ->  Parallel Seq Scan on t1
                           Locus: HashedWorkers
                           Parallel Workers: 2
   ->  Parallel Hash
         Locus: Entry
         ->  Gather Motion 2:1  (slice2; segments: 2)
               Locus: Entry
               ->  Parallel Seq Scan on rt1
                     Locus: SegmentGeneralWorkers
                     Parallel Workers: 2
 Optimizer: Postgres query optimizer
(21 rows)
abort;

The final locus may be elided to Entry if possible.

Authored-by: Zhang Mingli avamingli@gmail.com

fix #ISSUE_Number

Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
Sign the Contributor License Agreement as prompted for your first-time contribution(One-time setup).
Learn the coding contribution guide, including our code conventions, workflow and more.
List your communication in the GitHub Issues or Discussions (if has or needed).
Document changes.
Add tests for the change
Pass make installcheck
Pass make -C src/test installcheck-cbdb-parallel
Feel free to request cloudberrydb/dev team for review and approval when your PR is ready🥳

For a parallel join, we may benefit if gather SegmentGeneralWorkers to SingleQE. Gather(SegmentGeneralWorkers) join SingleQE, return join locus: SingleQE. We may win if we are a parallel-aware join, SingleQE is on the inner side that means there is a chance to generate a parallel join under SingleQE. In this case, we have both side parallel and may benefit. See ex 5_P_2_2 in cbdb_parallel.sql If not parallel-aware, we are not sure for the benefit and a simgle test shows lower performance, ex: parallel scan on replicated table and join with SingleQE which is a non-parallel plan. SingleQE join Gather(SegmentGeneralWorkers), return join locus: SingleQE. We may win if gather to SingleQE no matter what parallel-aware is. SingleQE is outer side, there could be a parallel plan under it. So we may benefit even without a shared hash table. Let the planner decide. See ex 2_P_5_2 in cbdb_parallel.sql The final locus may be elided to Entry if possible. Authored-by: Zhang Mingli avamingli@gmail.com

For test case: create table t0(c0 inet) distributed randomly; create table t2(c0 inet) distributed randomly; create table t3(c0 inet) distributed randomly; SELECT ALL t2.c0, t3.c0, t0.c0 FROM t0, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0)) WHERE (((('0.5496844753539182')||(t3.c0)))LIKE(CAST((0.13292931)::MONEY AS VARCHAR(971)))) UNION ALL SELECT t2.c0, t3.c0, t0.c0 FROM t0, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0)) WHERE NOT ((((('0.5496844753539182')||(t3.c0)))LIKE((CAST(0.13292931 AS MONEY))::VARCHAR(971)))) UNION ALL SELECT ALL t2.c0, t3.c0, t0.c0 FROM t0*, ONLY t3 FULL OUTER JOIN t2 ON ((t2.c0)=(t3.c0)) WHERE ((((('0.5496844753539182')||(t3.c0)))LIKE((CAST(0.13292931 AS MONEY))::VARCHAR(971)))) ISNULL; will cause crash because of assert failure in 'create_plan_recurse'. '#3 0x00007fe94eccf476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007fe94ecb57f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007fe94fcdd548 in ExceptionalCondition (conditionName=0x7fe95043dcd0 "best_path->parallel_workers == best_path->locus.parallel_workers", errorType=0x7fe95043db06 "FailedAssertion", fileName=0x7fe95043dbdb "createplan.c", lineNumber=623) at assert.c:48 #6 0x00007fe94f94918f in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0380, flags=1) at createplan.c:623 #7 0x00007fe94f94a1f8 in create_append_plan (root=0x55d7cbe96f78, best_path=0x55d7cbec0700, flags=1) at createplan.c:1380 apache#8 0x00007fe94f948d37 in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0700, flags=1) at createplan.c:481 apache#9 0x00007fe94f94e2d1 in create_motion_plan (root=0x55d7cbe96f78, path=0x55d7cbec0e50) at createplan.c:3316 #10 0x00007fe94f9490dc in create_plan_recurse (root=0x55d7cbe96f78, best_path=0x55d7cbec0e50, flags=1) at createplan.c:608 apache#11 0x00007fe94f948ba3 in create_plan (root=0x55d7cbe96f78, best_path=0x55d7cbec0e50, curSlice=0x55d7cbe96f20) at createplan.c:392' The parallel_workers should be set to zero because parallel full join is not supported yet.

## Problem An error occurs in python lib when a plpython function is executed. After our analysis, in the user's cluster, a plpython UDF was running with the unstable network, and got a timeout error: `failed to acquire resources on one or more segments`. Then a plpython UDF was run in the same session, and the UDF failed with GC error. Here is the core dump: ``` 2023-11-24 10:15:18.945507 CST,,,p2705198,th2081832064,,,,0,,,seg-1,,,,,"LOG","00000","3rd party error log: #0 0x7f7c68b6d55b in frame_dealloc /home/cc/repo/cpython/Objects/frameobject.c:509:5 #1 0x7f7c68b5109d in gen_send_ex /home/cc/repo/cpython/Objects/genobject.c:108:9 #2 0x7f7c68af9ddd in PyIter_Next /home/cc/repo/cpython/Objects/abstract.c:3118:14 #3 0x7f7c78caa5c0 in PLy_exec_function /home/cc/repo/gpdb6/src/pl/plpython/plpy_exec.c:134:11 #4 0x7f7c78cb5ffb in plpython_call_handler /home/cc/repo/gpdb6/src/pl/plpython/plpy_main.c:387:13 #5 0x562f5e008bb5 in ExecMakeTableFunctionResult /home/cc/repo/gpdb6/src/backend/executor/execQual.c:2395:13 #6 0x562f5e0dddec in FunctionNext_guts /home/cc/repo/gpdb6/src/backend/executor/nodeFunctionscan.c:142:5 #7 0x562f5e0da094 in FunctionNext /home/cc/repo/gpdb6/src/backend/executor/nodeFunctionscan.c:350:11 apache#8 0x562f5e03d4b0 in ExecScanFetch /home/cc/repo/gpdb6/src/backend/executor/execScan.c:84:9 apache#9 0x562f5e03cd8f in ExecScan /home/cc/repo/gpdb6/src/backend/executor/execScan.c:154:10 #10 0x562f5e0da072 in ExecFunctionScan /home/cc/repo/gpdb6/src/backend/executor/nodeFunctionscan.c:380:9 apache#11 0x562f5e001a1c in ExecProcNode /home/cc/repo/gpdb6/src/backend/executor/execProcnode.c:1071:13 apache#12 0x562f5dfe6377 in ExecutePlan /home/cc/repo/gpdb6/src/backend/executor/execMain.c:3202:10 apache#13 0x562f5dfe5bf4 in standard_ExecutorRun /home/cc/repo/gpdb6/src/backend/executor/execMain.c:1171:5 apache#14 0x562f5dfe4877 in ExecutorRun /home/cc/repo/gpdb6/src/backend/executor/execMain.c:992:4 apache#15 0x562f5e857e69 in PortalRunSelect /home/cc/repo/gpdb6/src/backend/tcop/pquery.c:1164:4 apache#16 0x562f5e856d3f in PortalRun /home/cc/repo/gpdb6/src/backend/tcop/pquery.c:1005:18 apache#17 0x562f5e84607a in exec_simple_query /home/cc/repo/gpdb6/src/backend/tcop/postgres.c:1848:10 ``` ## Reproduce We can use a simple procedure to reproduce the above problem: - set timeout GUC: `gpconfig -c gp_segment_connect_timeout -v 5` and `gpstop -ari` - prepare function: ``` CREATE EXTENSION plpythonu; CREATE OR REPLACE FUNCTION test_func() RETURNS SETOF int AS $$ plpy.execute("select pg_backend_pid()") for i in range(0, 5): yield (i) $$ LANGUAGE plpythonu; ``` - exit from the current psql session. - stop the postmaster of segment: `gdb -p "the pid of segment postmaster"` - enter a psql session. - call `SELECT test_func();` and get error ``` gpadmin=# select test_func(); ERROR: function "test_func" error fetching next item from iterator (plpy_elog.c:121) DETAIL: Exception: failed to acquire resources on one or more segments CONTEXT: Traceback (most recent call last): PL/Python function "test_func" ``` - quit gdb and make postmaster runnable. - call `SELECT test_func();` again and get panic ``` gpadmin=# SELECT test_func(); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> ``` ## Analysis - There is an SPI call in test_func(): `plpy.execute()`. - Then coordinator will start a subtransaction by PLy_spi_subtransaction_begin(); - Meanwhile, if the segment cannot receive the instruction from the coordinator, the subtransaction beginning procedure return fails. - BUT! The Python processor does not know whether an error happened and does not clean its environment. - Then the next plpython UDF in the same session will fail due to the wrong Python environment. ## Solution - Use try-catch to catch the exception caused by PLy_spi_subtransaction_begin() - set the python error indicator by PLy_spi_exception_set() Co-authored-by: Chen Mulong <chenmulong@gmail.com>

avamingli force-pushed the enable_single_join_segment_general_workers branch 2 times, most recently from fad20d0 to d772ab6 Compare December 11, 2023 03:49

avamingli force-pushed the enable_single_join_segment_general_workers branch from d772ab6 to f8081e1 Compare December 11, 2023 03:50

avamingli closed this Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable SingleQE join with SegmentGeneralWorkers#5

Enable SingleQE join with SegmentGeneralWorkers#5
avamingli wants to merge 1 commit into
mainfrom
enable_single_join_segment_general_workers

avamingli commented Dec 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avamingli commented Dec 8, 2023

Change logs

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Contributor's Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant