Commit b94fa97
[SPARK-28345][SQL][PYTHON] PythonUDF predicate should be able to pushdown to join
## What changes were proposed in this pull request?
A `Filter` predicate using `PythonUDF` can't be push down into join condition, currently. A predicate like that should be able to push down to join condition. For `PythonUDF`s that can't be evaluated in join condition, `PullOutPythonUDFInJoinCondition` will pull them out later.
An example like:
```scala
val pythonTestUDF = TestPythonUDF(name = "udf")
val left = Seq((1, 2), (2, 3)).toDF("a", "b")
val right = Seq((1, 2), (3, 4)).toDF("c", "d")
val df = left.crossJoin(right).where(pythonTestUDF($"a") === pythonTestUDF($"c"))
```
Query plan before the PR:
```
== Physical Plan ==
*(3) Project [a#2121, b#2122, c#2132, d#2133]
+- *(3) Filter (pythonUDF0#2142 = pythonUDF1#2143)
+- BatchEvalPython [udf(a#2121), udf(c#2132)], [pythonUDF0#2142, pythonUDF1#2143]
+- BroadcastNestedLoopJoin BuildRight, Cross
:- *(1) Project [_1#2116 AS a#2121, _2#2117 AS b#2122]
: +- LocalTableScan [_1#2116, _2#2117]
+- BroadcastExchange IdentityBroadcastMode
+- *(2) Project [_1#2127 AS c#2132, _2#2128 AS d#2133]
+- LocalTableScan [_1#2127, _2#2128]
```
Query plan after the PR:
```
== Physical Plan ==
*(3) Project [a#2121, b#2122, c#2132, d#2133]
+- *(3) BroadcastHashJoin [pythonUDF0#2142], [pythonUDF0#2143], Cross, BuildRight
:- BatchEvalPython [udf(a#2121)], [pythonUDF0#2142]
: +- *(1) Project [_1#2116 AS a#2121, _2#2117 AS b#2122]
: +- LocalTableScan [_1#2116, _2#2117]
+- BroadcastExchange HashedRelationBroadcastMode(List(input[2, string, true]))
+- BatchEvalPython [udf(c#2132)], [pythonUDF0#2143]
+- *(2) Project [_1#2127 AS c#2132, _2#2128 AS d#2133]
+- LocalTableScan [_1#2127, _2#2128]
```
After this PR, the join can use `BroadcastHashJoin`, instead of `BroadcastNestedLoopJoin`.
## How was this patch tested?
Added tests.
Closes #25106 from viirya/pythonudf-join-condition.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>1 parent 8e26d4d commit b94fa97
3 files changed
Lines changed: 59 additions & 4 deletions
File tree
- sql
- catalyst/src
- main/scala/org/apache/spark/sql/catalyst/expressions
- test/scala/org/apache/spark/sql/catalyst/optimizer
- core/src/test/scala/org/apache/spark/sql
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
119 | 123 | | |
120 | 124 | | |
121 | 125 | | |
| |||
Lines changed: 31 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
27 | | - | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| |||
41 | 42 | | |
42 | 43 | | |
43 | 44 | | |
44 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
45 | 49 | | |
46 | | - | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
47 | 53 | | |
48 | 54 | | |
49 | 55 | | |
| |||
1202 | 1208 | | |
1203 | 1209 | | |
1204 | 1210 | | |
| 1211 | + | |
| 1212 | + | |
| 1213 | + | |
| 1214 | + | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
| 1220 | + | |
| 1221 | + | |
| 1222 | + | |
| 1223 | + | |
| 1224 | + | |
| 1225 | + | |
| 1226 | + | |
| 1227 | + | |
| 1228 | + | |
| 1229 | + | |
| 1230 | + | |
| 1231 | + | |
| 1232 | + | |
1205 | 1233 | | |
Lines changed: 24 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
| 30 | + | |
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
| |||
994 | 995 | | |
995 | 996 | | |
996 | 997 | | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
997 | 1020 | | |
0 commit comments