Do not call gporca for simple queries#900
Conversation
c5f93f5 to
e36b873
Compare
|
Fix tests when gporca is disabled, need to run test workflow once again |
|
Sorry, I see here failed only ic-resgroup-v2/resgroup/resgroup_cpu_max_percent test with But do not have any ideas why the resource group RG1_CPU_TEST CPU usage differs from 90 by more than 10%. Maybe it has nothing to do with my fixes at all (test passed the first time). |
|
Hi, thanks for your pointing it out, but I have significant concerns about this approach and implications. Defining what constitutes a "simple" query is inherently subjective and context-dependent. Additionally, this PR lacks sufficient data-driven justification. gpadmin=# explain(analyze) insert into test values(1);
QUERY PLAN
------------------------------------------------------------------------------------------------------
Insert on test (cost=0.00..0.01 rows=1 width=4) (actual time=0.298..0.300 rows=0 loops=1)
-> Result (cost=0.00..0.00 rows=1 width=8) (actual time=0.017..0.019 rows=1 loops=1)
-> Result (cost=0.00..0.00 rows=1 width=4) (actual time=0.016..0.017 rows=1 loops=1)
-> Result (cost=0.00..0.00 rows=1 width=1) (actual time=0.007..0.008 rows=1 loops=1)
Planning Time: 29.739 ms
(slice0) Executor memory: 111K bytes (seg1).
Memory used: 128000kB
Optimizer: Pivotal Optimizer (GPORCA)
Execution Time: 2.259 ms
(9 rows)
Time: 35.444 msWhile PG planner: gpadmin=# explain(analyze) insert into test values(1);
QUERY PLAN
--------------------------------------------------------------------------------------------
Insert on test (cost=0.00..0.03 rows=0 width=0) (actual time=0.082..0.083 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1)
Planning Time: 0.473 ms
(slice0) Executor memory: 110K bytes (seg1).
Memory used: 128000kB
Optimizer: Postgres query optimizer
Execution Time: 1.308 ms
(7 rows)
Time: 4.604 msA simple INSERT, ORCA takes more time to plan, but finally introduces additional 2 RESULT nodes, getting half the results with twice the effort. Instead of disabling ORCA, we should focus on addressing these inefficiencies directly. |
e36b873 to
7ef93c3
Compare
Thank you for thoroughly reviewing my idea! My goal here is to do some "magic" for database users, not for us as database developers. For that simple query, you can see that we wasted time mainly on planning.: But some of them do not want to do that. What they want is some kind of advice - what they should do in order to improve their performance. They may even not know how to see the query execution plan. We constantly teach them, write documents and give speeches. However, not all of them are advanced enough to choose an optimizer. So, it would be great if we could help them in some way. Create simple and understandable rules to automatically switch between optimizers. As for other considerations - I totally agree. Instead of disabling ORCA, we should focus on improving it. I personally have a bunch of ideas what I want try to improve in ORCA. But it needs time and while we are working on it, I want an instrument to switch between optimizers. I don't want to enable this by default, but I want the option to enable it for specific users. |
jiaqizho
left a comment
There was a problem hiding this comment.
LGTM with some comments...
As you mentioned, this does not benefit the kernel code in any way. On the contrary, it masks the problem and increases maintenance complexity. Your example shows that ORCA is significantly slower when inserting data, but your conclusion generalizes to so-called simple queries. I don’t mean to be rude, but while I find this requirement hardly worth discussing in terms of code, the code itself is quite sloppy. |
I understand and agree with your point, but I also agree that we can provide some As for maintenance concerns, my idea is that if the current |
7ef93c3 to
9a52071
Compare
9a52071 to
81f0ee2
Compare
Besides all the concerns which are not resolved: subquery, proportion of these so-called simple queries are actually I believe there are fundamental issues with the current implementation regarding the optimizer_relations_threshold. 1.Partitioned Tables: Partitioned Tables: When using a partitioned table (let's call it par) with optimizer_relations_threshold set to 1, this means that ORCA is not chosen as the optimizer. CREATE TABLE partrl (a int, b int, c int)
DISTRIBUTED BY (a)
PARTITION BY range(b)
SUBPARTITION BY list(c)
(
PARTITION p1 START (10) END (20) EVERY (5)
(
SUBPARTITION sp1 VALUES (1, 2)
),
PARTITION p2 START (0) END (10)
(
SUBPARTITION sp2 VALUES (3, 4),
SUBPARTITION sp1 VALUES (1, 2),
DEFAULT SUBPARTITION others
)
);
SELECT * FROM patrl; A single entry in rtable but in fact there would be all children tables during planner expanding those. 2.Inherited Tables: Inherited Tables: The same logic applies to inherited tables. When selecting from a parent table, there can be multiple child tables. This scenario also violates the notion of a "simple query," which is defined as having fewer range table entries than the optimizer_relations_threshold. 3. Union/INTESECT/xxx Union Operations: Similarly, in cases involving UNION, such as SELECT * FROM t1 UNION ALL SELECT * FROM t2, the planner expands these into multiple range tables ex: Void RTE, leading to scenarios where the number of range table entries exceeds the threshold. 4. GUC compatibility Adding this code may create complications in the future, especially if we later resolve the issues with ORCA. We could end up with redundant code and GUCs that are no longer useful, which could lead to confusion and compatibility issues for users who have already adopted them. 5. Put the ORCA codes into ORCA BTW, one simple rule is that ORCA GUCs should be used in ORCA codes, not pg planner side. These examples illustrate how the current design of optimizer_relations_threshold is problematic. I suspect there may be additional issues that I have not identified. |
|
I opened proposal discussion to decide whether it's a good idea or not #937. Close this PR. Indeed, we must first discuss all the details in order to move forward. |
I've made a simple test
Here the results:
Honestly, the expected result. Integration with gporca includes a large number of copies and transformations.
In this PR, I propose disabling gporca for simple queries such as insert values. Of course, users could do the same manually, but I have not heard anyone actually doing so. Therefore, it would be great if the database switches to the postgres optimizer if a query is too simple to use gporca. We know that gporca certainly won't produce a better execution plan.
I formalized it in the enabled_for_optimizer function. We use postgres optimizer if we do not use any of: aggregation, with clause, recurse clause, window functions. And the number of relations in a query less or equal optimizer_relations_threshold. Otherwise, use gporca.
P.S. This was inspired by conclusions from the Integrating the Orca Optimizer into MySQL article. One conclusion was that it is not advisable to use gporca for simple queries. Let's implement this )