Skip to content

Conversation

@Sevenannn
Copy link

@Sevenannn Sevenannn commented Oct 26, 2024

Which issue does this PR close?

Rationale for this change

For query

        select
            c_custkey,
            count(o_orderkey)
        from
            customer left outer join orders on
                        c_custkey = o_custkey
                    and o_comment not like '%special%requests%'
        group by
            c_custkey

The logical plan is

+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | BytesProcessedNode                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|               |   Federated                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|               |  Projection: customer.c_custkey, count(orders.o_orderkey)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|               |   Aggregate: groupBy=[[customer.c_custkey]], aggr=[[count(orders.o_orderkey)]]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|               |     Left Join:  Filter: customer.c_custkey = orders.o_custkey                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|               |       TableScan: customer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
|               |       Filter: orders.o_comment NOT LIKE Utf8("%special%requests%")                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|               |         TableScan: orders, partial_filters=[orders.o_comment NOT LIKE Utf8("%special%requests%")]              

The rewritten query will be:
SELECT customer.c_custkey, count(orders.o_orderkey) FROM customer LEFT JOIN orders ON ((customer.c_custkey = orders.o_custkey) AND (orders.o_comment NOT LIKE '%special%requests%' AND orders.o_comment NOT LIKE '%special%requests%')) GROUP BY customer.c_custkey

Under the current approach, the filter orders.o_comment NOT LIKE Utf8("%special%requests%") will occur twice in final query, although this has no effect on query result correctness, it brings performance overhead by including duplicated conditions.

What changes are included in this PR?

  • Use Hashset to accumulate filters instead of Vec in try_transform_to_simple_table_scan_with_filters to avoid adding duplicated filters
  • Tests to verify changes

Are these changes tested?

Yes

Are there any user-facing changes?

No

@Sevenannn Sevenannn marked this pull request as ready for review October 29, 2024 23:58
@Sevenannn Sevenannn merged commit aa9dc60 into spiceai-42 Oct 30, 2024
@Sevenannn Sevenannn deleted the qianqian/unparser-filter branch October 30, 2024 03:07
phillipleblanc pushed a commit that referenced this pull request Nov 12, 2024
* Eliminate duplicated filter within (filter(TableScan)) plan

* Updates

* fix

* add test

* fix
phillipleblanc pushed a commit that referenced this pull request Nov 12, 2024
* Eliminate duplicated filter within (filter(TableScan)) plan

* Updates

* fix

* add test

* fix
Sevenannn added a commit that referenced this pull request Nov 14, 2024
* Eliminate duplicated filter within (filter(TableScan)) plan

* Updates

* fix

* add test

* fix
sgrebnov pushed a commit that referenced this pull request Nov 29, 2024
…pache#13422)

* Eliminate duplicated filter within (filter(TableScan)) plan (#51)

* Eliminate duplicated filter within (filter(TableScan)) plan

* Updates

* fix

* add test

* fix

* Preserve the filter order when eliminating duplicated filter #56

* Use IndexSet instead of Vec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants