Skip to content

Conversation

@Sevenannn
Copy link
Contributor

Which issue does this PR close?

N/A

Rationale for this change

For query

        select
            c_custkey,
            count(o_orderkey)
        from
            customer left outer join orders on
                        c_custkey = o_custkey
                    and o_comment not like '%special%requests%'
        group by
            c_custkey

The logical plan is

+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | BytesProcessedNode                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|               |   Federated                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|               |  Projection: customer.c_custkey, count(orders.o_orderkey)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|               |   Aggregate: groupBy=[[customer.c_custkey]], aggr=[[count(orders.o_orderkey)]]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|               |     Left Join:  Filter: customer.c_custkey = orders.o_custkey                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|               |       TableScan: customer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|               |       Filter: orders.o_comment NOT LIKE Utf8("%special%requests%")                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|               |         TableScan: orders, partial_filters=[orders.o_comment NOT LIKE Utf8("%special%requests%")]              

The rewritten query will be:
SELECT customer.c_custkey, count(orders.o_orderkey) FROM customer LEFT JOIN orders ON ((customer.c_custkey = orders.o_custkey) AND (orders.o_comment NOT LIKE '%special%requests%' AND orders.o_comment NOT LIKE '%special%requests%')) GROUP BY customer.c_custkey

Under the current approach, the filter orders.o_comment NOT LIKE Utf8("%special%requests%") will occur twice in final query, although this has no effect on query result correctness, it brings performance overhead by including duplicated conditions.

What changes are included in this PR?

  • Use Vec to store filter and preserve ordering
  • Check if filter exist in Vec when adding filter

Are these changes tested?

Yes

Are there any user-facing changes?

No

* Eliminate duplicated filter within (filter(TableScan)) plan

* Updates

* fix

* add test

* fix
@github-actions github-actions bot added the sql SQL Planner label Nov 14, 2024
@Sevenannn Sevenannn changed the title Qianqian/filter fix Fix Duplicated filters within (filter(TableScan)) plan Nov 14, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Sevenannn -- this is a great idea. Thank you @jayzhan211 for the review

@alamb alamb changed the title Fix Duplicated filters within (filter(TableScan)) plan Fix Duplicated filters within (filter(TableScan)) plan for unparser Nov 15, 2024
@alamb alamb marked this pull request as draft November 18, 2024 21:04
@alamb
Copy link
Contributor

alamb commented Nov 18, 2024

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@Sevenannn Sevenannn marked this pull request as ready for review November 21, 2024 02:11
@Sevenannn Sevenannn marked this pull request as draft November 21, 2024 02:14
@Sevenannn Sevenannn marked this pull request as ready for review November 26, 2024 18:32
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@jayzhan211 jayzhan211 merged commit 6a4bf0f into apache:main Nov 27, 2024
25 checks passed
@jayzhan211
Copy link
Contributor

Thanks @Sevenannn @alamb

@Sevenannn
Copy link
Contributor Author

Thanks @jayzhan211 @alamb

@alamb
Copy link
Contributor

alamb commented Dec 6, 2024

Thanks @jayzhan211 @alamb

Thank you for all the bug fixes @Sevenannn -- it turns out that @wiedld just found you had fixed a bug we ran into in InfluxDB as well (#12979) 🙏

@phillipleblanc phillipleblanc deleted the qianqian/filter-fix branch April 8, 2025 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants