Commit d7499ae
[SPARK-31256][SQL] DataFrameNaFunctions.drop should work for nested columns
### What changes were proposed in this pull request?
#26700 removed the ability to drop a row whose nested column value is null.
For example, for the following `df`:
```
val schema = new StructType()
.add("c1", new StructType()
.add("c1-1", StringType)
.add("c1-2", StringType))
val data = Seq(Row(Row(null, "a2")), Row(Row("b1", "b2")), Row(null))
val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
df.show
+--------+
| c1|
+--------+
| [, a2]|
|[b1, b2]|
| null|
+--------+
```
In Spark 2.4.4,
```
df.na.drop("any", Seq("c1.c1-1")).show
+--------+
| c1|
+--------+
|[b1, b2]|
+--------+
```
In Spark 2.4.5 or Spark 3.0.0-preview2, if nested columns are specified, they are ignored.
```
df.na.drop("any", Seq("c1.c1-1")).show
+--------+
| c1|
+--------+
| [, a2]|
|[b1, b2]|
| null|
+--------+
```
### Why are the changes needed?
This seems like a regression.
### Does this PR introduce any user-facing change?
Now, the nested column can be specified:
```
df.na.drop("any", Seq("c1.c1-1")).show
+--------+
| c1|
+--------+
|[b1, b2]|
+--------+
```
Also, if `*` is specified as a column, it will throw an `AnalysisException` that `*` cannot be resolved, which was the behavior in 2.4.4. Currently, in master, it has no effect.
### How was this patch tested?
Updated existing tests.
Closes #28266 from imback82/SPARK-31256.
Authored-by: Terry Kim <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>1 parent bc212df commit d7499ae
2 files changed
Lines changed: 35 additions & 25 deletions
File tree
- sql/core/src
- main/scala/org/apache/spark/sql
- test/scala/org/apache/spark/sql
Lines changed: 5 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
92 | | - | |
| 92 | + | |
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
| |||
115 | 115 | | |
116 | 116 | | |
117 | 117 | | |
118 | | - | |
| 118 | + | |
119 | 119 | | |
120 | 120 | | |
121 | 121 | | |
| |||
480 | 480 | | |
481 | 481 | | |
482 | 482 | | |
483 | | - | |
| 483 | + | |
484 | 484 | | |
485 | 485 | | |
486 | 486 | | |
487 | 487 | | |
488 | 488 | | |
489 | 489 | | |
490 | 490 | | |
491 | | - | |
| 491 | + | |
492 | 492 | | |
493 | 493 | | |
494 | | - | |
495 | | - | |
496 | | - | |
| 494 | + | |
497 | 495 | | |
498 | 496 | | |
499 | 497 | | |
| |||
Lines changed: 30 additions & 18 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
48 | 58 | | |
49 | 59 | | |
50 | 60 | | |
| |||
275 | 285 | | |
276 | 286 | | |
277 | 287 | | |
278 | | - | |
| 288 | + | |
279 | 289 | | |
280 | 290 | | |
281 | 291 | | |
282 | | - | |
283 | 292 | | |
284 | 293 | | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
289 | | - | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
290 | 301 | | |
291 | | - | |
292 | | - | |
293 | | - | |
294 | | - | |
| 302 | + | |
| 303 | + | |
295 | 304 | | |
296 | | - | |
297 | | - | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
298 | 308 | | |
299 | | - | |
300 | | - | |
| 309 | + | |
| 310 | + | |
301 | 311 | | |
302 | | - | |
303 | | - | |
304 | | - | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
305 | 317 | | |
306 | 318 | | |
307 | 319 | | |
| |||
0 commit comments