|
19 | 19 |
|
20 | 20 | # Working with Exprs |
21 | 21 |
|
22 | | -Coming Soon |
| 22 | +<!-- https://github.com/apache/arrow-datafusion/issues/7304 --> |
| 23 | + |
| 24 | +`Expr` is short for "expression". It is a core abstraction in DataFusion for representing a computation, and follows the standard "expression tree" abstraction found in most compilers and databases. |
| 25 | + |
| 26 | +For example, the SQL expression `a + b` would be represented as an `Expr` with a `BinaryExpr` variant. A `BinaryExpr` has a left and right `Expr` and an operator. |
| 27 | + |
| 28 | +As another example, the SQL expression `a + b * c` would be represented as an `Expr` with a `BinaryExpr` variant. The left `Expr` would be `a` and the right `Expr` would be another `BinaryExpr` with a left `Expr` of `b` and a right `Expr` of `c`. As a classic expression tree, this would look like: |
| 29 | + |
| 30 | +```text |
| 31 | + ┌────────────────────┐ |
| 32 | + │ BinaryExpr │ |
| 33 | + │ op: + │ |
| 34 | + └────────────────────┘ |
| 35 | + ▲ ▲ |
| 36 | + ┌───────┘ └────────────────┐ |
| 37 | + │ │ |
| 38 | +┌────────────────────┐ ┌────────────────────┐ |
| 39 | +│ Expr::Col │ │ BinaryExpr │ |
| 40 | +│ col: a │ │ op: * │ |
| 41 | +└────────────────────┘ └────────────────────┘ |
| 42 | + ▲ ▲ |
| 43 | + ┌────────┘ └─────────┐ |
| 44 | + │ │ |
| 45 | + ┌────────────────────┐ ┌────────────────────┐ |
| 46 | + │ Expr::Col │ │ Expr::Col │ |
| 47 | + │ col: b │ │ col: c │ |
| 48 | + └────────────────────┘ └────────────────────┘ |
| 49 | +``` |
| 50 | + |
| 51 | +As the writer of a library, you may want to use or create `Expr`s to represent computations that you want to perform. This guide will walk you through how to make your own scalar UDF as an `Expr` and how to rewrite `Expr`s to inline the simple UDF. |
| 52 | + |
| 53 | +There are also executable examples for working with `Expr`s: |
| 54 | + |
| 55 | +- [rewrite_expr.rs](../../../datafusion-examples/examples/catalog.rs) |
| 56 | +- [expr_api.rs](../../../datafusion-examples/examples/expr_api.rs) |
| 57 | + |
| 58 | +## A Scalar UDF Example |
| 59 | + |
| 60 | +We'll use a `ScalarUDF` expression as our example. This necessitates implementing an actual UDF, and for ease we'll use the same example from the [adding UDFs](./adding-udfs.md) guide. |
| 61 | + |
| 62 | +So assuming you've written that function, you can use it to create an `Expr`: |
| 63 | + |
| 64 | +```rust |
| 65 | +let add_one_udf = create_udf( |
| 66 | + "add_one", |
| 67 | + vec![DataType::Int64], |
| 68 | + Arc::new(DataType::Int64), |
| 69 | + Volatility::Immutable, |
| 70 | + make_scalar_function(add_one), // <-- the function we wrote |
| 71 | +); |
| 72 | + |
| 73 | +// make the expr `add_one(5)` |
| 74 | +let expr = add_one_udf.call(vec![lit(5)]); |
| 75 | + |
| 76 | +// make the expr `add_one(my_column)` |
| 77 | +let expr = add_one_udf.call(vec![col("my_column")]); |
| 78 | +``` |
| 79 | + |
| 80 | +If you'd like to learn more about `Expr`s, before we get into the details of creating and rewriting them, you can read the [expression user-guide](./../user-guide/expressions.md). |
| 81 | + |
| 82 | +## Rewriting Exprs |
| 83 | + |
| 84 | +Rewriting Expressions is the process of taking an `Expr` and transforming it into another `Expr`. This is useful for a number of reasons, including: |
| 85 | + |
| 86 | +- Simplifying `Expr`s to make them easier to evaluate |
| 87 | +- Optimizing `Expr`s to make them faster to evaluate |
| 88 | +- Converting `Expr`s to other forms, e.g. converting a `BinaryExpr` to a `CastExpr` |
| 89 | + |
| 90 | +In our example, we'll use rewriting to update our `add_one` UDF, to be rewritten as a `BinaryExpr` with a `Literal` of 1. We're effectively inlining the UDF. |
| 91 | + |
| 92 | +### Rewriting with `transform` |
| 93 | + |
| 94 | +To implement the inlining, we'll need to write a function that takes an `Expr` and returns a `Result<Expr>`. If the expression is _not_ to be rewritten `Transformed::No` is used to wrap the original `Expr`. If the expression _is_ to be rewritten, `Transformed::Yes` is used to wrap the new `Expr`. |
| 95 | + |
| 96 | +```rust |
| 97 | +fn rewrite_add_one(expr: Expr) -> Result<Expr> { |
| 98 | + expr.transform(&|expr| { |
| 99 | + Ok(match expr { |
| 100 | + Expr::ScalarUDF(scalar_fun) if scalar_fun.fun.name == "add_one" => { |
| 101 | + let input_arg = scalar_fun.args[0].clone(); |
| 102 | + let new_expression = input_arg + lit(1i64); |
| 103 | + |
| 104 | + Transformed::Yes(new_expression) |
| 105 | + } |
| 106 | + _ => Transformed::No(expr), |
| 107 | + }) |
| 108 | + }) |
| 109 | +} |
| 110 | +``` |
| 111 | + |
| 112 | +### Creating an `OptimizerRule` |
| 113 | + |
| 114 | +In DataFusion, an `OptimizerRule` is a trait that supports rewriting`Expr`s that appear in various parts of the `LogicalPlan`. It follows DataFusion's general mantra of trait implementations to drive behavior. |
| 115 | + |
| 116 | +We'll call our rule `AddOneInliner` and implement the `OptimizerRule` trait. The `OptimizerRule` trait has two methods: |
| 117 | + |
| 118 | +- `name` - returns the name of the rule |
| 119 | +- `try_optimize` - takes a `LogicalPlan` and returns an `Option<LogicalPlan>`. If the rule is able to optimize the plan, it returns `Some(LogicalPlan)` with the optimized plan. If the rule is not able to optimize the plan, it returns `None`. |
| 120 | + |
| 121 | +```rust |
| 122 | +struct AddOneInliner {} |
| 123 | + |
| 124 | +impl OptimizerRule for AddOneInliner { |
| 125 | + fn name(&self) -> &str { |
| 126 | + "add_one_inliner" |
| 127 | + } |
| 128 | + |
| 129 | + fn try_optimize( |
| 130 | + &self, |
| 131 | + plan: &LogicalPlan, |
| 132 | + config: &dyn OptimizerConfig, |
| 133 | + ) -> Result<Option<LogicalPlan>> { |
| 134 | + // Map over the expressions and rewrite them |
| 135 | + let new_expressions = plan |
| 136 | + .expressions() |
| 137 | + .into_iter() |
| 138 | + .map(|expr| rewrite_add_one(expr)) |
| 139 | + .collect::<Result<Vec<_>>>()?; |
| 140 | + |
| 141 | + let inputs = plan.inputs().into_iter().cloned().collect::<Vec<_>>(); |
| 142 | + |
| 143 | + let plan = plan.with_new_exprs(&new_expressions, &inputs); |
| 144 | + |
| 145 | + plan.map(Some) |
| 146 | + } |
| 147 | +} |
| 148 | +``` |
| 149 | + |
| 150 | +Note the use of `rewrite_add_one` which is mapped over `plan.expressions()` to rewrite the expressions, then `plan.with_new_exprs` is used to create a new `LogicalPlan` with the rewritten expressions. |
| 151 | + |
| 152 | +We're almost there. Let's just test our rule works properly. |
| 153 | + |
| 154 | +## Testing the Rule |
| 155 | + |
| 156 | +Testing the rule is fairly simple, we can create a SessionState with our rule and then create a DataFrame and run a query. The logical plan will be optimized by our rule. |
| 157 | + |
| 158 | +```rust |
| 159 | +use datafusion::prelude::*; |
| 160 | + |
| 161 | +let rules = Arc::new(AddOneInliner {}); |
| 162 | +let state = ctx.state().with_optimizer_rules(vec![rules]); |
| 163 | + |
| 164 | +let ctx = SessionContext::with_state(state); |
| 165 | +ctx.register_udf(add_one); |
| 166 | + |
| 167 | +let sql = "SELECT add_one(1) AS added_one"; |
| 168 | +let plan = ctx.sql(sql).await?.logical_plan(); |
| 169 | + |
| 170 | +println!("{:?}", plan); |
| 171 | +``` |
| 172 | + |
| 173 | +This results in the following output: |
| 174 | + |
| 175 | +```text |
| 176 | +Projection: Int64(1) + Int64(1) AS added_one |
| 177 | + EmptyRelation |
| 178 | +``` |
| 179 | + |
| 180 | +I.e. the `add_one` UDF has been inlined into the projection. |
| 181 | + |
| 182 | +## Conclusion |
| 183 | + |
| 184 | +In this guide, we've seen how to create `Expr`s programmatically and how to rewrite them. This is useful for simplifying and optimizing `Expr`s. We've also seen how to test our rule to ensure it works properly. |
0 commit comments