Commit d34590c
committed
[SPARK-31441][PYSPARK][SQL][2.4] Support duplicated column names for toPandas with arrow execution
### What changes were proposed in this pull request?
This is to backport #28210.
This PR is adding support duplicated column names for `toPandas` with Arrow execution.
### Why are the changes needed?
When we execute `toPandas()` with Arrow execution, it fails if the column names have duplicates.
```py
>>> spark.sql("select 1 v, 1 v").toPandas()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/path/to/lib/python3.7/site-packages/pyspark/sql/dataframe.py", line 2132, in toPandas
pdf = table.to_pandas()
File "pyarrow/array.pxi", line 441, in pyarrow.lib._PandasConvertible.to_pandas
File "pyarrow/table.pxi", line 1367, in pyarrow.lib.Table._to_pandas
File "/path/to/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 653, in table_to_blockmanager
columns = _deserialize_column_index(table, all_columns, column_indexes)
File "/path/to/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 704, in _deserialize_column_index
columns = _flatten_single_level_multiindex(columns)
File "/path/to/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 937, in _flatten_single_level_multiindex
raise ValueError('Found non-unique column index')
ValueError: Found non-unique column index
```
### Does this PR introduce any user-facing change?
Yes, previously we will face an error above, but after this PR, we will see the result:
```py
>>> spark.sql("select 1 v, 1 v").toPandas()
v v
0 1 1
```
### How was this patch tested?
Added and modified related tests.
Closes #28221 from ueshin/issues/SPARK-31441/2.4/to_pandas.
Authored-by: Takuya UESHIN <[email protected]>
Signed-off-by: Takuya UESHIN <[email protected]>1 parent 49abdc4 commit d34590c
2 files changed
+26
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2127 | 2127 | | |
2128 | 2128 | | |
2129 | 2129 | | |
2130 | | - | |
| 2130 | + | |
| 2131 | + | |
| 2132 | + | |
2131 | 2133 | | |
2132 | 2134 | | |
2133 | 2135 | | |
| 2136 | + | |
| 2137 | + | |
2134 | 2138 | | |
2135 | 2139 | | |
2136 | 2140 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3296 | 3296 | | |
3297 | 3297 | | |
3298 | 3298 | | |
| 3299 | + | |
| 3300 | + | |
| 3301 | + | |
| 3302 | + | |
| 3303 | + | |
| 3304 | + | |
| 3305 | + | |
| 3306 | + | |
| 3307 | + | |
| 3308 | + | |
| 3309 | + | |
| 3310 | + | |
| 3311 | + | |
3299 | 3312 | | |
3300 | 3313 | | |
3301 | 3314 | | |
| |||
3307 | 3320 | | |
3308 | 3321 | | |
3309 | 3322 | | |
3310 | | - | |
3311 | | - | |
3312 | | - | |
3313 | | - | |
3314 | | - | |
3315 | | - | |
| 3323 | + | |
| 3324 | + | |
| 3325 | + | |
| 3326 | + | |
| 3327 | + | |
| 3328 | + | |
| 3329 | + | |
| 3330 | + | |
3316 | 3331 | | |
3317 | 3332 | | |
3318 | 3333 | | |
| |||
0 commit comments