You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Summary
This PR enhances the physical-expr projection handling with several
improvements needed for better projection management in datasources.
## Changes
1. **Add trait implementations**:
- Added `PartialEq` and `Eq` for `ProjectionExpr`
- Added `PartialEq` and `Eq` for `ProjectionExprs`
2. **Add `project_batch()` method**:
- Efficiently projects `RecordBatch` with pre-computed schema
- Handles empty projections correctly
- Reduces schema projection overhead for repeated calls
3. **Fix `update_expr()` bug**:
- **Bug**: Previously returned `None` for literal expressions (no column
references)
- **Fix**: Now returns `Some(expr)` for both `Unchanged` and
`RewrittenValid` states
- **Impact**: Critical for queries like `SELECT 1 FROM table` where no
file columns are needed
4. **Change `from_indices()` signature**:
- Changed from `&SchemaRef` to `&Schema` for consistency
5. **Add comprehensive tests**:
- `test_merge_empty_projection_with_literal()` - Reproduces roundtrip
issue
- `test_update_expr_with_literal()` - Tests literal handling
- `test_update_expr_with_complex_literal_expr()` - Tests mixed
expressions
## Part of
This PR is part of #18627 - a larger effort to refactor projection
handling in DataFusion.
## Testing
All tests pass:
- ✅ New projection tests
- ✅ Existing physical-expr test suite
- ✅ Doc tests
## AI use
I asked Claude to extract this change from #18627
---------
Co-authored-by: Jeffrey Vo <[email protected]>
let output_schema = Arc::new(self.project_schema(input_schema)?);
422
+
Ok(Projector{
423
+
projection:self.clone(),
424
+
output_schema,
425
+
})
426
+
}
427
+
401
428
/// Project statistics according to this projection.
402
429
/// For example, for a projection `SELECT a AS x, b + 1 AS y`, where `a` is at index 0 and `b` is at index 1,
403
430
/// if the input statistics has column statistics for columns `a`, `b`, and `c`, the output statistics would have column statistics for columns `x` and `y`.
@@ -446,6 +473,57 @@ impl<'a> IntoIterator for &'a ProjectionExprs {
446
473
}
447
474
}
448
475
476
+
/// Applies a projection to record batches.
477
+
///
478
+
/// A [`Projector`] uses a set of projection expressions to transform
479
+
/// and a pre-computed output schema to project record batches accordingly.
480
+
///
481
+
/// The main reason to use a `Projector` is to avoid repeatedly computing
482
+
/// the output schema for each batch, which can be costly if the projection
483
+
/// expressions are complex.
484
+
#[derive(Clone,Debug)]
485
+
pubstructProjector{
486
+
projection:ProjectionExprs,
487
+
output_schema:SchemaRef,
488
+
}
489
+
490
+
implProjector{
491
+
/// Project a record batch according to this projector's expressions.
492
+
///
493
+
/// # Errors
494
+
/// This function returns an error if any expression evaluation fails
495
+
/// or if the output schema of the resulting record batch does not match
496
+
/// the pre-computed output schema of the projector.
0 commit comments