-
Notifications
You must be signed in to change notification settings - Fork 72
Extend attempt_predicate_pushdown to accept additional filters as arguments #1138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@rjzamora Since you'd initially worked on some of the code here it would be great if you could review as well. |
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #1138 +/- ##
==========================================
+ Coverage 81.42% 81.55% +0.13%
==========================================
Files 78 78
Lines 4392 4396 +4
Branches 796 798 +2
==========================================
+ Hits 3576 3585 +9
+ Misses 639 630 -9
- Partials 177 181 +4
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
rjzamora
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure these changes make sense @ayushdg - Thanks for the ping!
I left some thoughts about possible corner cases, but there is a decent chance that those corner cases are either imagined or unimportant.
| assert PluginTest1().get_plugin("some_key") == "value_2" | ||
|
|
||
|
|
||
| def test_predicate_pushdown(parquet_ddf): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems unnecessarily intimidating to me :)
Perhaps this should be broken into several different tests?
| # Expand the filter set with provided conjunctive and disjunctive filters | ||
| filters.extend([f] for f in disjunctive_filters or []) | ||
| # Add conjunctive filters to each disjunctive filter | ||
| for f in filters: | ||
| f.extend(conjunctive_filters or []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, and I think I (mostly) like the approach. However, it does seem possible that you will eventually want to do things that are currently prohibited. Some examples:
- You want to make a disjunctive addition to your filters, and the new filter contains a conjunction that should not be added to any of the other (existing) filters. For example, you want to apply the existing filters, but you also want to allow null values for a specific user-id. (does this already work?)
- You want to add a conjunctive filter to a subset of the pre-existing disjunctive filters.
- You want to drop pre-existing filters.
Not sure if any of these cases are important or meaningful at all. Either way, I'm pretty sure you could support (1) and (3) later if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments. You're right that all three cases aren't really support with the current addition and it only helps for passing additional filters that get applied to everything.
Ideally I'd like to consolidate the conjunctive and disjunctive args into a single arg in dnf form which allows some flexibility with the additional filters but doesn't allow good interactivity with existing ones.
While we're unable to return a full dnf for a dataframe rather than having some existing filters coming via the hlg and some passed in additionally, I can't think of a nice way for both to interact cleanly without some good tracking around how we represent that information.
|
Closing in favor of #1173 |
To build on top of some of the work in #1130, this pr adds the
conjunctive_filtersanddisjunctive_filtersarguments toattempt_predicate_pushdown.This allow applying the provided arguments in addition to everything that can be inferred from the HLG, allowing us to expand the predicate pushdown coverage to cases that cannot be inferred from the HLG.
Ideally in the future we'd expand the table_scan filter logic to generate a dnf instead of a hlg to better express more kinds of complex predicates.