Skip to content

Commit 720708c

Browse files
HyukjinKwoncloud-fan
authored andcommitted
[SPARK-20639][SQL] Add single argument support for to_timestamp in SQL with documentation improvement
## What changes were proposed in this pull request? This PR proposes three things as below: - Use casting rules to a timestamp in `to_timestamp` by default (it was `yyyy-MM-dd HH:mm:ss`). - Support single argument for `to_timestamp` similarly with APIs in other languages. For example, the one below works ``` import org.apache.spark.sql.functions._ Seq("2016-12-31 00:12:00.00").toDF("a").select(to_timestamp(col("a"))).show() ``` prints ``` +----------------------------------------+ |to_timestamp(`a`, 'yyyy-MM-dd HH:mm:ss')| +----------------------------------------+ | 2016-12-31 00:12:00| +----------------------------------------+ ``` whereas this does not work in SQL. **Before** ``` spark-sql> SELECT to_timestamp('2016-12-31 00:12:00'); Error in query: Invalid number of arguments for function to_timestamp; line 1 pos 7 ``` **After** ``` spark-sql> SELECT to_timestamp('2016-12-31 00:12:00'); 2016-12-31 00:12:00 ``` - Related document improvement for SQL function descriptions and other API descriptions accordingly. **Before** ``` spark-sql> DESCRIBE FUNCTION extended to_date; ... Usage: to_date(date_str, fmt) - Parses the `left` expression with the `fmt` expression. Returns null with invalid input. Extended Usage: Examples: > SELECT to_date('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 ``` ``` spark-sql> DESCRIBE FUNCTION extended to_timestamp; ... Usage: to_timestamp(timestamp, fmt) - Parses the `left` expression with the `format` expression to a timestamp. Returns null with invalid input. Extended Usage: Examples: > SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 00:00:00.0 ``` **After** ``` spark-sql> DESCRIBE FUNCTION extended to_date; ... Usage: to_date(date_str[, fmt]) - Parses the `date_str` expression with the `fmt` expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the `fmt` is omitted. Extended Usage: Examples: > SELECT to_date('2009-07-30 04:17:52'); 2009-07-30 > SELECT to_date('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 ``` ``` spark-sql> DESCRIBE FUNCTION extended to_timestamp; ... Usage: to_timestamp(timestamp[, fmt]) - Parses the `timestamp` expression with the `fmt` expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if the `fmt` is omitted. Extended Usage: Examples: > SELECT to_timestamp('2016-12-31 00:12:00'); 2016-12-31 00:12:00 > SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 00:00:00 ``` ## How was this patch tested? Added tests in `datetime.sql`. Author: hyukjinkwon <[email protected]> Closes #17901 from HyukjinKwon/to_timestamp_arg.
1 parent af40bb1 commit 720708c

10 files changed

Lines changed: 68 additions & 70 deletions

File tree

R/pkg/R/functions.R

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1757,7 +1757,8 @@ setMethod("toRadians",
17571757
#' \url{http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html}.
17581758
#' If the string cannot be parsed according to the specified format (or default),
17591759
#' the value of the column will be null.
1760-
#' The default format is 'yyyy-MM-dd'.
1760+
#' By default, it follows casting rules to a DateType if the format is omitted
1761+
#' (equivalent to \code{cast(df$x, "date")}).
17611762
#'
17621763
#' @param x Column to parse.
17631764
#' @param format string to use to parse x Column to DateType. (optional)
@@ -1832,10 +1833,11 @@ setMethod("to_json", signature(x = "Column"),
18321833
#' \url{http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html}.
18331834
#' If the string cannot be parsed according to the specified format (or default),
18341835
#' the value of the column will be null.
1835-
#' The default format is 'yyyy-MM-dd HH:mm:ss'.
1836+
#' By default, it follows casting rules to a TimestampType if the format is omitted
1837+
#' (equivalent to \code{cast(df$x, "timestamp")}).
18361838
#'
18371839
#' @param x Column to parse.
1838-
#' @param format string to use to parse x Column to DateType. (optional)
1840+
#' @param format string to use to parse x Column to TimestampType. (optional)
18391841
#'
18401842
#' @rdname to_timestamp
18411843
#' @name to_timestamp

python/pyspark/sql/functions.py

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -144,12 +144,6 @@ def _():
144144
'measured in radians.',
145145
}
146146

147-
_functions_2_2 = {
148-
'to_date': 'Converts a string date into a DateType using the (optionally) specified format.',
149-
'to_timestamp': 'Converts a string timestamp into a timestamp type using the ' +
150-
'(optionally) specified format.',
151-
}
152-
153147
# math functions that take two arguments as input
154148
_binary_mathfunctions = {
155149
'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' +
@@ -987,9 +981,10 @@ def months_between(date1, date2):
987981
def to_date(col, format=None):
988982
"""Converts a :class:`Column` of :class:`pyspark.sql.types.StringType` or
989983
:class:`pyspark.sql.types.TimestampType` into :class:`pyspark.sql.types.DateType`
990-
using the optionally specified format. Default format is 'yyyy-MM-dd'.
991-
Specify formats according to
984+
using the optionally specified format. Specify formats according to
992985
`SimpleDateFormats <http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html>`_.
986+
By default, it follows casting rules to :class:`pyspark.sql.types.DateType` if the format
987+
is omitted (equivalent to ``col.cast("date")``).
993988
994989
>>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
995990
>>> df.select(to_date(df.t).alias('date')).collect()
@@ -1011,9 +1006,10 @@ def to_date(col, format=None):
10111006
def to_timestamp(col, format=None):
10121007
"""Converts a :class:`Column` of :class:`pyspark.sql.types.StringType` or
10131008
:class:`pyspark.sql.types.TimestampType` into :class:`pyspark.sql.types.DateType`
1014-
using the optionally specified format. Default format is 'yyyy-MM-dd HH:mm:ss'. Specify
1015-
formats according to
1009+
using the optionally specified format. Specify formats according to
10161010
`SimpleDateFormats <http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html>`_.
1011+
By default, it follows casting rules to :class:`pyspark.sql.types.TimestampType` if the format
1012+
is omitted (equivalent to ``col.cast("timestamp")``).
10171013
10181014
>>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
10191015
>>> df.select(to_timestamp(df.t).alias('dt')).collect()

python/pyspark/sql/tests.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2524,7 +2524,7 @@ def test_datetime_functions(self):
25242524
from datetime import date, datetime
25252525
df = self.spark.range(1).selectExpr("'2017-01-22' as dateCol")
25262526
parse_result = df.select(functions.to_date(functions.col("dateCol"))).first()
2527-
self.assertEquals(date(2017, 1, 22), parse_result['to_date(dateCol)'])
2527+
self.assertEquals(date(2017, 1, 22), parse_result['to_date(`dateCol`)'])
25282528

25292529
@unittest.skipIf(sys.version_info < (3, 3), "Unittest < 3.3 doesn't support mocking")
25302530
def test_unbounded_frames(self):

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala

Lines changed: 27 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1146,44 +1146,21 @@ case class ToUTCTimestamp(left: Expression, right: Expression)
11461146
}
11471147

11481148
/**
1149-
* Returns the date part of a timestamp or string.
1149+
* Parses a column to a date based on the given format.
11501150
*/
11511151
@ExpressionDescription(
1152-
usage = "_FUNC_(expr) - Extracts the date part of the date or timestamp expression `expr`.",
1152+
usage = """
1153+
_FUNC_(date_str[, fmt]) - Parses the `date_str` expression with the `fmt` expression to
1154+
a date. Returns null with invalid input. By default, it follows casting rules to a date if
1155+
the `fmt` is omitted.
1156+
""",
11531157
extended = """
11541158
Examples:
11551159
> SELECT _FUNC_('2009-07-30 04:17:52');
11561160
2009-07-30
1157-
""")
1158-
case class ToDate(child: Expression) extends UnaryExpression with ImplicitCastInputTypes {
1159-
1160-
// Implicit casting of spark will accept string in both date and timestamp format, as
1161-
// well as TimestampType.
1162-
override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
1163-
1164-
override def dataType: DataType = DateType
1165-
1166-
override def eval(input: InternalRow): Any = child.eval(input)
1167-
1168-
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
1169-
defineCodeGen(ctx, ev, d => d)
1170-
}
1171-
1172-
override def prettyName: String = "to_date"
1173-
}
1174-
1175-
/**
1176-
* Parses a column to a date based on the given format.
1177-
*/
1178-
// scalastyle:off line.size.limit
1179-
@ExpressionDescription(
1180-
usage = "_FUNC_(date_str, fmt) - Parses the `left` expression with the `fmt` expression. Returns null with invalid input.",
1181-
extended = """
1182-
Examples:
11831161
> SELECT _FUNC_('2016-12-31', 'yyyy-MM-dd');
11841162
2016-12-31
11851163
""")
1186-
// scalastyle:on line.size.limit
11871164
case class ParseToDate(left: Expression, format: Option[Expression], child: Expression)
11881165
extends RuntimeReplaceable {
11891166

@@ -1194,13 +1171,13 @@ case class ParseToDate(left: Expression, format: Option[Expression], child: Expr
11941171

11951172
def this(left: Expression) = {
11961173
// backwards compatability
1197-
this(left, Option(null), ToDate(left))
1174+
this(left, None, Cast(left, DateType))
11981175
}
11991176

12001177
override def flatArguments: Iterator[Any] = Iterator(left, format)
12011178
override def sql: String = {
12021179
if (format.isDefined) {
1203-
s"$prettyName(${left.sql}, ${format.get.sql}"
1180+
s"$prettyName(${left.sql}, ${format.get.sql})"
12041181
} else {
12051182
s"$prettyName(${left.sql})"
12061183
}
@@ -1212,24 +1189,36 @@ case class ParseToDate(left: Expression, format: Option[Expression], child: Expr
12121189
/**
12131190
* Parses a column to a timestamp based on the supplied format.
12141191
*/
1215-
// scalastyle:off line.size.limit
12161192
@ExpressionDescription(
1217-
usage = "_FUNC_(timestamp, fmt) - Parses the `left` expression with the `format` expression to a timestamp. Returns null with invalid input.",
1193+
usage = """
1194+
_FUNC_(timestamp[, fmt]) - Parses the `timestamp` expression with the `fmt` expression to
1195+
a timestamp. Returns null with invalid input. By default, it follows casting rules to
1196+
a timestamp if the `fmt` is omitted.
1197+
""",
12181198
extended = """
12191199
Examples:
1200+
> SELECT _FUNC_('2016-12-31 00:12:00');
1201+
2016-12-31 00:12:00
12201202
> SELECT _FUNC_('2016-12-31', 'yyyy-MM-dd');
1221-
2016-12-31 00:00:00.0
1203+
2016-12-31 00:00:00
12221204
""")
1223-
// scalastyle:on line.size.limit
1224-
case class ParseToTimestamp(left: Expression, format: Expression, child: Expression)
1205+
case class ParseToTimestamp(left: Expression, format: Option[Expression], child: Expression)
12251206
extends RuntimeReplaceable {
12261207

12271208
def this(left: Expression, format: Expression) = {
1228-
this(left, format, Cast(UnixTimestamp(left, format), TimestampType))
1209+
this(left, Option(format), Cast(UnixTimestamp(left, format), TimestampType))
12291210
}
12301211

1212+
def this(left: Expression) = this(left, None, Cast(left, TimestampType))
1213+
12311214
override def flatArguments: Iterator[Any] = Iterator(left, format)
1232-
override def sql: String = s"$prettyName(${left.sql}, ${format.sql})"
1215+
override def sql: String = {
1216+
if (format.isDefined) {
1217+
s"$prettyName(${left.sql}, ${format.get.sql})"
1218+
} else {
1219+
s"$prettyName(${left.sql})"
1220+
}
1221+
}
12331222

12341223
override def prettyName: String = "to_timestamp"
12351224
override def dataType: DataType = TimestampType

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -427,7 +427,7 @@ object DateTimeUtils {
427427
* The return type is [[Option]] in order to distinguish between 0 and null. The following
428428
* formats are allowed:
429429
*
430-
* `yyyy`,
430+
* `yyyy`
431431
* `yyyy-[m]m`
432432
* `yyyy-[m]m-[d]d`
433433
* `yyyy-[m]m-[d]d `

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -495,14 +495,6 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
495495
NextDay(Literal(Date.valueOf("2015-07-23")), Literal.create(null, StringType)), null)
496496
}
497497

498-
test("function to_date") {
499-
checkEvaluation(
500-
ToDate(Literal(Date.valueOf("2015-07-22"))),
501-
DateTimeUtils.fromJavaDate(Date.valueOf("2015-07-22")))
502-
checkEvaluation(ToDate(Literal.create(null, DateType)), null)
503-
checkConsistencyBetweenInterpretedAndCodegen(ToDate, DateType)
504-
}
505-
506498
test("function trunc") {
507499
def testTrunc(input: Date, fmt: String, expected: Date): Unit = {
508500
checkEvaluation(TruncDate(Literal.create(input, DateType), Literal.create(fmt, StringType)),

sql/core/src/main/scala/org/apache/spark/sql/functions.scala

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2683,13 +2683,12 @@ object functions {
26832683
def unix_timestamp(s: Column, p: String): Column = withExpr { UnixTimestamp(s.expr, Literal(p)) }
26842684

26852685
/**
2686-
* Convert time string to a Unix timestamp (in seconds).
2687-
* Uses the pattern "yyyy-MM-dd HH:mm:ss" and will return null on failure.
2686+
* Convert time string to a Unix timestamp (in seconds) by casting rules to `TimestampType`.
26882687
* @group datetime_funcs
26892688
* @since 2.2.0
26902689
*/
26912690
def to_timestamp(s: Column): Column = withExpr {
2692-
new ParseToTimestamp(s.expr, Literal("yyyy-MM-dd HH:mm:ss"))
2691+
new ParseToTimestamp(s.expr)
26932692
}
26942693

26952694
/**
@@ -2704,15 +2703,15 @@ object functions {
27042703
}
27052704

27062705
/**
2707-
* Converts the column into DateType.
2706+
* Converts the column into `DateType` by casting rules to `DateType`.
27082707
*
27092708
* @group datetime_funcs
27102709
* @since 1.5.0
27112710
*/
2712-
def to_date(e: Column): Column = withExpr { ToDate(e.expr) }
2711+
def to_date(e: Column): Column = withExpr { new ParseToDate(e.expr) }
27132712

27142713
/**
2715-
* Converts the column into a DateType with a specified format
2714+
* Converts the column into a `DateType` with a specified format
27162715
* (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html])
27172716
* return null if fail.
27182717
*

sql/core/src/test/resources/sql-tests/inputs/datetime.sql

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,7 @@
22

33
-- [SPARK-16836] current_date and current_timestamp literals
44
select current_date = current_date(), current_timestamp = current_timestamp();
5+
6+
select to_date(null), to_date('2016-12-31'), to_date('2016-12-31', 'yyyy-MM-dd');
7+
8+
select to_timestamp(null), to_timestamp('2016-12-31 00:12:00'), to_timestamp('2016-12-31', 'yyyy-MM-dd');
Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
-- Automatically generated by SQLQueryTestSuite
2-
-- Number of queries: 1
2+
-- Number of queries: 3
33

44

55
-- !query 0
@@ -8,3 +8,19 @@ select current_date = current_date(), current_timestamp = current_timestamp()
88
struct<(current_date() = current_date()):boolean,(current_timestamp() = current_timestamp()):boolean>
99
-- !query 0 output
1010
true true
11+
12+
13+
-- !query 1
14+
select to_date(null), to_date('2016-12-31'), to_date('2016-12-31', 'yyyy-MM-dd')
15+
-- !query 1 schema
16+
struct<to_date(NULL):date,to_date('2016-12-31'):date,to_date('2016-12-31', 'yyyy-MM-dd'):date>
17+
-- !query 1 output
18+
NULL 2016-12-31 2016-12-31
19+
20+
21+
-- !query 2
22+
select to_timestamp(null), to_timestamp('2016-12-31 00:12:00'), to_timestamp('2016-12-31', 'yyyy-MM-dd')
23+
-- !query 2 schema
24+
struct<to_timestamp(NULL):timestamp,to_timestamp('2016-12-31 00:12:00'):timestamp,to_timestamp('2016-12-31', 'yyyy-MM-dd'):timestamp>
25+
-- !query 2 output
26+
NULL 2016-12-31 00:12:00 2016-12-31 00:00:00

sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -387,7 +387,7 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext {
387387
df.selectExpr("to_date(s)"),
388388
Seq(Row(Date.valueOf("2015-07-22")), Row(Date.valueOf("2014-12-31")), Row(null)))
389389

390-
// Now with format
390+
// now with format
391391
checkAnswer(
392392
df.select(to_date(col("t"), "yyyy-MM-dd")),
393393
Seq(Row(Date.valueOf("2015-07-22")), Row(Date.valueOf("2014-12-31")),
@@ -400,7 +400,7 @@ class DateFunctionsSuite extends QueryTest with SharedSQLContext {
400400
df.select(to_date(col("s"), "yyyy-MM-dd")),
401401
Seq(Row(Date.valueOf("2015-07-22")), Row(Date.valueOf("2014-12-31")), Row(null)))
402402

403-
// now switch format
403+
// now switch format
404404
checkAnswer(
405405
df.select(to_date(col("s"), "yyyy-dd-MM")),
406406
Seq(Row(null), Row(null), Row(Date.valueOf("2014-12-31"))))

0 commit comments

Comments
 (0)