Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/sql-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -1843,6 +1843,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see

## Upgrading From Spark SQL 2.3 to 2.4

- Since Spark 2.4, Spark will display table description column Last Access value as UNKNOWN when the value was Jan 01 1970.
- Since Spark 2.4, Spark maximizes the usage of a vectorized ORC reader for ORC files by default. To do that, `spark.sql.orc.impl` and `spark.sql.orc.filterPushdown` change their default values to `native` and `true` respectively.
- In PySpark, when Arrow optimization is enabled, previously `toPandas` just failed when Arrow optimization is unable to be used whereas `createDataFrame` from Pandas DataFrame allowed the fallback to non-optimization. Now, both `toPandas` and `createDataFrame` from Pandas DataFrame allow the fallback by default, which can be switched off by `spark.sql.execution.arrow.fallback.enabled`.
- Since Spark 2.4, writing an empty dataframe to a directory launches at least one write task, even if physically the dataframe has no partition. This introduces a small behavior change that for self-describing file formats like Parquet and Orc, Spark creates a metadata-only file in the target directory when writing a 0-partition dataframe, so that schema inference can still work if users read that directory later. The new behavior is more reasonable and more consistent regarding writing empty dataframe.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,10 @@ case class CatalogTablePartition(
map.put("Partition Parameters", s"{${parameters.map(p => p._1 + "=" + p._2).mkString(", ")}}")
}
map.put("Created Time", new Date(createTime).toString)
map.put("Last Access", new Date(lastAccessTime).toString)
val lastAccess = {
if (-1 == lastAccessTime) "UNKNOWN" else new Date(lastAccessTime).toString
}
map.put("Last Access", lastAccess)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the val lastAccess?

map.put("Last Access",
      if (-1 == lastAccessTime) "UNKNOWN" else new Date(lastAccessTime).toString)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current way is also fine

stats.foreach(s => map.put("Partition Statistics", s.simpleString))
map
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ package org.apache.spark.sql.hive.execution

import java.io.File
import java.net.URI
import java.util.Date

import scala.language.existentials

Expand Down Expand Up @@ -2250,6 +2251,18 @@ class HiveDDLSuite
}
}

test("SPARK-24812: desc formatted table for last access verification") {
withTable("t1") {
sql(
"CREATE TABLE IF NOT EXISTS t1 (c1_int INT, c2_string STRING, c3_float FLOAT)")
val desc = sql("DESC FORMATTED t1").filter($"col_name".startsWith("Last Access"))
.select("data_type")
// check if the last access time doesnt have the default date of year
// 1970 as its a wrong access time
assert(!(desc.first.toString.contains("1970")))
}
}

test("SPARK-24681 checks if nested column names do not include ',', ':', and ';'") {
val expectedMsg = "Cannot create a table having a nested column whose name contains invalid " +
"characters (',', ':', ';') in Hive metastore."
Expand Down