Commit d981bfc
[SPARK-11086][SPARKR] Use dropFactors column-wise instead of nested loop when createDataFrame
Use `dropFactors` column-wise instead of nested loop when `createDataFrame` from a `data.frame`
At this moment SparkR createDataFrame is using nested loop to convert factors to character when called on a local data.frame. It works but is incredibly slow especially with data.table (~ 2 orders of magnitude compared to PySpark / Pandas version on a DateFrame of size 1M rows x 2 columns).
A simple improvement is to apply `dropFactor `column-wise and then reshape output list.
It should at least partially address [SPARK-8277](https://issues.apache.org/jira/browse/SPARK-8277).
Author: zero323 <[email protected]>
Closes #9099 from zero323/SPARK-11086.
(cherry picked from commit d7d9fa0)
Signed-off-by: Shivaram Venkataraman <[email protected]>1 parent 2f0f8bb commit d981bfc
2 files changed
Lines changed: 49 additions & 21 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
20 | 40 | | |
21 | 41 | | |
22 | 42 | | |
23 | 43 | | |
24 | 44 | | |
25 | 45 | | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
| 46 | + | |
41 | 47 | | |
42 | 48 | | |
43 | 49 | | |
| |||
90 | 96 | | |
91 | 97 | | |
92 | 98 | | |
93 | | - | |
94 | | - | |
| 99 | + | |
95 | 100 | | |
96 | | - | |
| 101 | + | |
97 | 102 | | |
98 | 103 | | |
99 | 104 | | |
100 | 105 | | |
101 | 106 | | |
102 | 107 | | |
103 | | - | |
104 | | - | |
105 | | - | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
106 | 118 | | |
107 | 119 | | |
108 | 120 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
242 | 242 | | |
243 | 243 | | |
244 | 244 | | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
245 | 253 | | |
246 | 254 | | |
247 | 255 | | |
| |||
283 | 291 | | |
284 | 292 | | |
285 | 293 | | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
286 | 302 | | |
287 | 303 | | |
288 | 304 | | |
| |||
0 commit comments