Skip to content

Do not call gporca for simple queries#900

Closed
leborchuk wants to merge 1 commit intoapache:mainfrom
leborchuk:DoNotCallgporca
Closed

Do not call gporca for simple queries#900
leborchuk wants to merge 1 commit intoapache:mainfrom
leborchuk:DoNotCallgporca

Conversation

@leborchuk
Copy link
Contributor

@leborchuk leborchuk commented Feb 2, 2025

I've made a simple test

  1. create demo-cluster
  2. execute simple insert values query 1000 times with gporca disabled
  3. repeat 2. with gporca enabled
  4. compare the results - test with gporca enabled more than 12x slower

Here the results:

postgres=# set optimizer=off;
SET
Time: 1.657 ms
postgres=# do $$
begin
for i in 1..1000 loop
insert into test values(i);
end loop;
end;
$$;
DO
Time: 801.485 ms
postgres=# set optimizer=on;
SET
Time: 1.540 ms
postgres=# do $$
begin
for i in 1..1000 loop
insert into test values(i);
end loop;
end;
$$;
DO
Time: 10751.109 ms (00:10.751)

Honestly, the expected result. Integration with gporca includes a large number of copies and transformations.

In this PR, I propose disabling gporca for simple queries such as insert values. Of course, users could do the same manually, but I have not heard anyone actually doing so. Therefore, it would be great if the database switches to the postgres optimizer if a query is too simple to use gporca. We know that gporca certainly won't produce a better execution plan.

I formalized it in the enabled_for_optimizer function. We use postgres optimizer if we do not use any of: aggregation, with clause, recurse clause, window functions. And the number of relations in a query less or equal optimizer_relations_threshold. Otherwise, use gporca.

P.S. This was inspired by conclusions from the Integrating the Orca Optimizer into MySQL article. One conclusion was that it is not advisable to use gporca for simple queries. Let's implement this )

@leborchuk
Copy link
Contributor Author

Fix tests when gporca is disabled, need to run test workflow once again

@leborchuk
Copy link
Contributor Author

Sorry, I see here failed only ic-resgroup-v2/resgroup/resgroup_cpu_max_percent test with

diff -I HINT: -I CONTEXT: -I GP_IGNORE: -U3 /__w/cloudberry/cloudberry/src/test/isolation2/expected/resgroup/resgroup_cpu_max_percent.out /__w/cloudberry/cloudberry/src/test/isolation2/results/resgroup/resgroup_cpu_max_percent.out
--- /__w/cloudberry/cloudberry/src/test/isolation2/expected/resgroup/resgroup_cpu_max_percent.out	2025-02-04 06:22:21.746467393 -0800
+++ /__w/cloudberry/cloudberry/src/test/isolation2/results/resgroup/resgroup_cpu_max_percent.out	2025-02-04 06:22:21.754467504 -0800
@@ -224,7 +224,7 @@
 SELECT verify_cpu_usage('rg1_cpu_test', 90, 10);
  verify_cpu_usage 
 ------------------
- t                
+ f                
 (1 row)

But do not have any ideas why the resource group RG1_CPU_TEST CPU usage differs from 90 by more than 10%. Maybe it has nothing to do with my fixes at all (test passed the first time).

@my-ship-it my-ship-it requested a review from jiaqizho February 8, 2025 05:24
@avamingli
Copy link
Contributor

avamingli commented Feb 8, 2025

Hi, thanks for your pointing it out, but I have significant concerns about this approach and implications.

Defining what constitutes a "simple" query is inherently subjective and context-dependent.
The current implementation attempts to formalize this, but it feels overly simplistic and overlooks critical cases like subqueries.The choice of optimizer is fundamentally a tuning decision that should remain in the hands of the user or application, not the kernel.

Additionally, this PR lacks sufficient data-driven justification.
Without detailed performance metrics—such as where ORCA is slower for "simple" queries, the magnitude of the performance gap due to the bug below:

gpadmin=# explain(analyze) insert into test values(1);
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Insert on test  (cost=0.00..0.01 rows=1 width=4) (actual time=0.298..0.300 rows=0 loops=1)
   ->  Result  (cost=0.00..0.00 rows=1 width=8) (actual time=0.017..0.019 rows=1 loops=1)
         ->  Result  (cost=0.00..0.00 rows=1 width=4) (actual time=0.016..0.017 rows=1 loops=1)
               ->  Result  (cost=0.00..0.00 rows=1 width=1) (actual time=0.007..0.008 rows=1 loops=1)
 Planning Time: 29.739 ms
   (slice0)    Executor memory: 111K bytes (seg1).
 Memory used:  128000kB
 Optimizer: Pivotal Optimizer (GPORCA)
 Execution Time: 2.259 ms
(9 rows)

Time: 35.444 ms

While PG planner:

gpadmin=# explain(analyze) insert into test values(1);
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Insert on test  (cost=0.00..0.03 rows=0 width=0) (actual time=0.082..0.083 rows=0 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
 Planning Time: 0.473 ms
   (slice0)    Executor memory: 110K bytes (seg1).
 Memory used:  128000kB
 Optimizer: Postgres query optimizer
 Execution Time: 1.308 ms
(7 rows)

Time: 4.604 ms

A simple INSERT, ORCA takes more time to plan, but finally introduces additional 2 RESULT nodes, getting half the results with twice the effort.

Instead of disabling ORCA, we should focus on addressing these inefficiencies directly.
This would not only resolve the immediate issue but also improve ORCA’s performance for all users, not just those running "simple" queries.
Rather than sidestepping the problem, we should tackle the root cause head-on.
By identifying and fixing the specific inefficiencies in ORCA, you could deliver a more robust and user-friendly solution that benefits everyone in the long term.

@leborchuk
Copy link
Contributor Author

Hi, thanks for your pointing it out, but I have significant concerns about this approach and implications.

Defining what constitutes a "simple" query is inherently subjective and context-dependent. The current implementation attempts to formalize this, but it feels overly simplistic and overlooks critical cases like subqueries.The choice of optimizer is fundamentally a tuning decision that should remain in the hands of the user or application, not the kernel.

Additionally, this PR lacks sufficient data-driven justification. Without detailed performance metrics—such as where ORCA is slower for "simple" queries, the magnitude of the performance gap due to the bug below:

gpadmin=# explain(analyze) insert into test values(1);
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Insert on test  (cost=0.00..0.01 rows=1 width=4) (actual time=0.298..0.300 rows=0 loops=1)
   ->  Result  (cost=0.00..0.00 rows=1 width=8) (actual time=0.017..0.019 rows=1 loops=1)
         ->  Result  (cost=0.00..0.00 rows=1 width=4) (actual time=0.016..0.017 rows=1 loops=1)
               ->  Result  (cost=0.00..0.00 rows=1 width=1) (actual time=0.007..0.008 rows=1 loops=1)
 Planning Time: 29.739 ms
   (slice0)    Executor memory: 111K bytes (seg1).
 Memory used:  128000kB
 Optimizer: Pivotal Optimizer (GPORCA)
 Execution Time: 2.259 ms
(9 rows)

Time: 35.444 ms

While PG planner:

gpadmin=# explain(analyze) insert into test values(1);
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Insert on test  (cost=0.00..0.03 rows=0 width=0) (actual time=0.082..0.083 rows=0 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
 Planning Time: 0.473 ms
   (slice0)    Executor memory: 110K bytes (seg1).
 Memory used:  128000kB
 Optimizer: Postgres query optimizer
 Execution Time: 1.308 ms
(7 rows)

Time: 4.604 ms

A simple INSERT, ORCA takes more time to plan, but finally introduces additional 2 RESULT nodes, getting half the results with twice the effort.

Instead of disabling ORCA, we should focus on addressing these inefficiencies directly. This would not only resolve the immediate issue but also improve ORCA’s performance for all users, not just those running "simple" queries. Rather than sidestepping the problem, we should tackle the root cause head-on. By identifying and fixing the specific inefficiencies in ORCA, you could deliver a more robust and user-friendly solution that benefits everyone in the long term.

Thank you for thoroughly reviewing my idea!

My goal here is to do some "magic" for database users, not for us as database developers.

For that simple query, you can see that we wasted time mainly on planning.: Planning Time: 29.739 ms for gporca and Planning Time: 0.473 ms for postgres query optimizer. As you correctly said for queries like that tuning decision should be done by users or applications.

But some of them do not want to do that. What they want is some kind of advice - what they should do in order to improve their performance. They may even not know how to see the query execution plan. We constantly teach them, write documents and give speeches. However, not all of them are advanced enough to choose an optimizer.

So, it would be great if we could help them in some way. Create simple and understandable rules to automatically switch between optimizers.

As for other considerations - I totally agree. Instead of disabling ORCA, we should focus on improving it. I personally have a bunch of ideas what I want try to improve in ORCA. But it needs time and while we are working on it, I want an instrument to switch between optimizers. I don't want to enable this by default, but I want the option to enable it for specific users.

Copy link
Contributor

@jiaqizho jiaqizho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some comments...

@avamingli
Copy link
Contributor

My goal here is to do some "magic" for database users, not for us as database developers.

As you mentioned, this does not benefit the kernel code in any way. On the contrary, it masks the problem and increases maintenance complexity.
I stand by my point that this PR is not a good idea.

Your example shows that ORCA is significantly slower when inserting data, but your conclusion generalizes to so-called simple queries.
However, there are no examples or test data provided. For instance, what is the query plan for a statement like select * from t? This test seems simple, yet it is missing.
Additionally, as I mentioned, what proportion of these so-called simple queries are actually insert statements? If the majority of the issues are caused by insert, we have even less reason to disable simple queries like select.

I don’t mean to be rude, but while I find this requirement hardly worth discussing in terms of code, the code itself is quite sloppy.
For example, the issue I raised about subqueries is not addressed in your code.

@jiaqizho
Copy link
Contributor

My goal here is to do some "magic" for database users, not for us as database developers.

As you mentioned, this does not benefit the kernel code in any way. On the contrary, it masks the problem and increases maintenance complexity. I stand by my point that this PR is not a good idea.

Your example shows that ORCA is significantly slower when inserting data, but your conclusion generalizes to so-called simple queries. However, there are no examples or test data provided. For instance, what is the query plan for a statement like select * from t? This test seems simple, yet it is missing. Additionally, as I mentioned, what proportion of these so-called simple queries are actually insert statements? If the majority of the issues are caused by insert, we have even less reason to disable simple queries like select.

I don’t mean to be rude, but while I find this requirement hardly worth discussing in terms of code, the code itself is quite sloppy. For example, the issue I raised about subqueries is not addressed in your code.

I understand and agree with your point, but I also agree that we can provide some simply and work options (as the current PR does).

As for maintenance concerns, my idea is that if the current parse->rtable can no longer be used as a IF condition (many case may lead to this), then we should think of other ways to solve the desired effect of the current PR.

@avamingli
Copy link
Contributor

avamingli commented Feb 17, 2025

My goal here is to do some "magic" for database users, not for us as database developers.

As you mentioned, this does not benefit the kernel code in any way. On the contrary, it masks the problem and increases maintenance complexity. I stand by my point that this PR is not a good idea.

Your example shows that ORCA is significantly slower when inserting data, but your conclusion generalizes to so-called simple queries. However, there are no examples or test data provided. For instance, what is the query plan for a statement like select * from t? This test seems simple, yet it is missing. Additionally, as I mentioned, what proportion of these so-called simple queries are actually insert statements? If the majority of the issues are caused by insert, we have even less reason to disable simple queries like select.

I don’t mean to be rude, but while I find this requirement hardly worth discussing in terms of code, the code itself is quite sloppy. For example, the issue I raised about subqueries is not addressed in your code.

Besides all the concerns which are not resolved: subquery, proportion of these so-called simple queries are actually insert statements.

I believe there are fundamental issues with the current implementation regarding the optimizer_relations_threshold.

1.Partitioned Tables:

Partitioned Tables: When using a partitioned table (let's call it par) with optimizer_relations_threshold set to 1, this means that ORCA is not chosen as the optimizer.
However, when there might be two or more range tables involved in a query, the optimizer_relations_threshold can prevent ORCA from being utilized effectively. This undermines one of ORCA's key features: partition pruning. Consequently, the benefits of ORCA are lost.

CREATE TABLE partrl (a int, b int, c int)
DISTRIBUTED BY (a)
PARTITION BY range(b)
SUBPARTITION BY list(c)
(
        PARTITION p1 START (10) END (20) EVERY (5)
        (
                SUBPARTITION sp1 VALUES (1, 2)
        ),
        PARTITION p2 START (0) END (10)
        (
                SUBPARTITION sp2 VALUES (3, 4),
                SUBPARTITION sp1 VALUES (1, 2),
                DEFAULT SUBPARTITION others
        )
);

SELECT * FROM patrl; 

A single entry in rtable but in fact there would be all children tables during planner expanding those.

2.Inherited Tables:

Inherited Tables: The same logic applies to inherited tables. When selecting from a parent table, there can be multiple child tables. This scenario also violates the notion of a "simple query," which is defined as having fewer range table entries than the optimizer_relations_threshold.

3. Union/INTESECT/xxx

Union Operations: Similarly, in cases involving UNION, such as SELECT * FROM t1 UNION ALL SELECT * FROM t2, the planner expands these into multiple range tables ex: Void RTE, leading to scenarios where the number of range table entries exceeds the threshold.

4. GUC compatibility

Adding this code may create complications in the future, especially if we later resolve the issues with ORCA. We could end up with redundant code and GUCs that are no longer useful, which could lead to confusion and compatibility issues for users who have already adopted them.

5. Put the ORCA codes into ORCA

BTW, one simple rule is that ORCA GUCs should be used in ORCA codes, not pg planner side.
optimizer_xxxx means it belongs to ORCA c++ codes, even you tried to implement it, please obey the rule if you want to fallback to pg planner for some reasons.

These examples illustrate how the current design of optimizer_relations_threshold is problematic. I suspect there may be additional issues that I have not identified.
While I have outlined these concerns, I do not expect later fixes on this PR.
My point is that we should address the root problems with ORCA rather than introducing workarounds that complicate the kernel codes for planner maintainers (don't push DBA's workaround to kernel).
I'm not convinced that it's worth discussing further unless there are stronger reasons and test data presented.
Instead of pursuing this approach, our focus should be on fixing the real problems with ORCA.

@leborchuk
Copy link
Contributor Author

I opened proposal discussion to decide whether it's a good idea or not #937. Close this PR. Indeed, we must first discuss all the details in order to move forward.

@leborchuk leborchuk closed this Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants