Skip to content
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/sql-migration-guide-upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ displayTitle: Spark SQL Upgrading Guide

- The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set.

- In Spark version 2.4 and earlier, `Dataset.groupByKey` results to a grouped dataset with key attribute wrongly named as "value", if the key is atomic type, e.g. int, string, etc. This is counterintuitive and makes the schema of aggregation queries weird. For example, the schema of `ds.groupByKey(...).count()` is `(value, count)`. Since Spark 3.0, we name the grouping attribute to "key".


## Upgrading From Spark SQL 2.3 to 2.4

- In Spark version 2.3 and earlier, the second parameter to array_contains function is implicitly promoted to the element type of first array type parameter. This type promotion can be lossy and may cause `array_contains` function to return wrong result. This problem has been addressed in 2.4 by employing a safer type promotion mechanism. This can cause some change in behavior and are illustrated in the table below.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,7 @@ class KeyValueGroupedDataset[K, V] private[sql](
columns.map(_.withInputType(vExprEnc, dataAttributes).named)
val keyColumn = if (!kExprEnc.isSerializedAsStruct) {
assert(groupingAttributes.length == 1)
groupingAttributes.head
Alias(groupingAttributes.head, "key")()
} else {
Alias(CreateStruct(groupingAttributes), "key")()
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1570,6 +1570,11 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
val agg = ds.groupByKey(x => x).agg(sum("_1").as[Long], sum($"_2" + 1).as[Long])
checkDatasetUnorderly(agg, ((1, 2), 1L, 3L), ((2, 3), 2L, 4L), ((3, 4), 3L, 5L))
}

test("key attribute of primitive type under typed aggregation should be named as key") {
val ds = Seq(1, 2, 3).toDS()
assert(ds.groupByKey(x => x).count().schema.head.name == "key")
}
}

case class TestDataUnion(x: Int, y: Int, z: Int)
Expand Down