-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Misc improvements to ProjectionExprs #18719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6100daf to
3f0dd4c
Compare
This PR adds trait implementations, a project_batch() method, and fixes a bug in update_expr() for literal expressions. Also adds comprehensive tests. Part of apache#18627
3f0dd4c to
e0f4a8d
Compare
Co-authored-by: Jeffrey Vo <[email protected]>
| /// | ||
| /// This function accepts a pre-computed output schema instead of calling [`ProjectionExprs::project_schema`] | ||
| /// so that repeated calls do not have schema projection overhead. | ||
| pub fn project_batch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a better API than this would be make_projector() -> Projector where Projector holds a reference the output schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the method be used in #18627?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! I'm just thinking it would be less error prone to package it up in a struct. I'll push the change here then we can rebase #18627 to use the better version once this is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that users don't have to track output_schema and pass it in, they can just keep track of a Projector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented my idea in f6afd71 and was able to use it to simplify ProjectionExec a bit so it's already used 😄
## Summary This PR enhances the physical-expr projection handling with several improvements needed for better projection management in datasources. ## Changes 1. **Add trait implementations**: - Added `PartialEq` and `Eq` for `ProjectionExpr` - Added `PartialEq` and `Eq` for `ProjectionExprs` 2. **Add `project_batch()` method**: - Efficiently projects `RecordBatch` with pre-computed schema - Handles empty projections correctly - Reduces schema projection overhead for repeated calls 3. **Fix `update_expr()` bug**: - **Bug**: Previously returned `None` for literal expressions (no column references) - **Fix**: Now returns `Some(expr)` for both `Unchanged` and `RewrittenValid` states - **Impact**: Critical for queries like `SELECT 1 FROM table` where no file columns are needed 4. **Change `from_indices()` signature**: - Changed from `&SchemaRef` to `&Schema` for consistency 5. **Add comprehensive tests**: - `test_merge_empty_projection_with_literal()` - Reproduces roundtrip issue - `test_update_expr_with_literal()` - Tests literal handling - `test_update_expr_with_complex_literal_expr()` - Tests mixed expressions ## Part of This PR is part of apache#18627 - a larger effort to refactor projection handling in DataFusion. ## Testing All tests pass: - ✅ New projection tests - ✅ Existing physical-expr test suite - ✅ Doc tests ## AI use I asked Claude to extract this change from apache#18627 --------- Co-authored-by: Jeffrey Vo <[email protected]>
Summary
This PR enhances the physical-expr projection handling with several improvements needed for better projection management in datasources.
Changes
Add trait implementations:
PartialEqandEqforProjectionExprPartialEqandEqforProjectionExprsAdd
project_batch()method:RecordBatchwith pre-computed schemaFix
update_expr()bug:Nonefor literal expressions (no column references)Some(expr)for bothUnchangedandRewrittenValidstatesSELECT 1 FROM tablewhere no file columns are neededChange
from_indices()signature:&SchemaRefto&Schemafor consistencyAdd comprehensive tests:
test_merge_empty_projection_with_literal()- Reproduces roundtrip issuetest_update_expr_with_literal()- Tests literal handlingtest_update_expr_with_complex_literal_expr()- Tests mixed expressionsPart of
This PR is part of #18627 - a larger effort to refactor projection handling in DataFusion.
Testing
All tests pass:
AI use
I asked Claude to extract this change from #18627