Commit d68198d
committed
[SPARK-23223][SQL] Make stacking dataset transforms more performant
## What changes were proposed in this pull request?
It is a common pattern to apply multiple transforms to a `Dataset` (using `Dataset.withColumn` for example. This is currently quite expensive because we run `CheckAnalysis` on the full plan and create an encoder for each intermediate `Dataset`.
This PR extends the usage of the `AnalysisBarrier` to include `CheckAnalysis`. By doing this we hide the already analyzed plan from `CheckAnalysis` because barrier is a `LeafNode`. The `AnalysisBarrier` is in the `FinishAnalysis` phase of the optimizer.
We also make binding the `Dataset` encoder lazy. The bound encoder is only needed when we materialize the dataset.
## How was this patch tested?
Existing test should cover this.
Author: Herman van Hovell <hvanhovell@databricks.com>
Closes #20402 from hvanhovell/SPARK-23223.
(cherry picked from commit 2d903cf)
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>1 parent 4059454 commit d68198d
6 files changed
Lines changed: 25 additions & 21 deletions
File tree
- sql
- catalyst/src
- main/scala/org/apache/spark/sql/catalyst/analysis
- test/scala/org/apache/spark/sql/catalyst/analysis
- core/src/main/scala/org/apache/spark/sql
- execution
- hive/src/main/scala/org/apache/spark/sql/hive/test
Lines changed: 14 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
101 | 114 | | |
102 | 115 | | |
103 | 116 | | |
| |||
178 | 191 | | |
179 | 192 | | |
180 | 193 | | |
181 | | - | |
182 | | - | |
| 194 | + | |
183 | 195 | | |
184 | 196 | | |
185 | 197 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
348 | 348 | | |
349 | 349 | | |
350 | 350 | | |
| 351 | + | |
351 | 352 | | |
352 | 353 | | |
353 | 354 | | |
| |||
Lines changed: 1 addition & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
57 | | - | |
58 | | - | |
| 57 | + | |
59 | 58 | | |
60 | 59 | | |
61 | 60 | | |
| |||
Lines changed: 6 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | | - | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
66 | 70 | | |
67 | 71 | | |
68 | 72 | | |
| |||
204 | 208 | | |
205 | 209 | | |
206 | 210 | | |
207 | | - | |
| 211 | + | |
208 | 212 | | |
209 | 213 | | |
210 | 214 | | |
| |||
Lines changed: 2 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
| 47 | + | |
60 | 48 | | |
61 | 49 | | |
62 | 50 | | |
| |||
66 | 54 | | |
67 | 55 | | |
68 | 56 | | |
69 | | - | |
| 57 | + | |
70 | 58 | | |
71 | 59 | | |
72 | 60 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
575 | 575 | | |
576 | 576 | | |
577 | 577 | | |
578 | | - | |
| 578 | + | |
579 | 579 | | |
580 | 580 | | |
581 | 581 | | |
| |||
0 commit comments