-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29838][SQL] PostgreSQL dialect: cast to timestamp #26472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...main/scala/org/apache/spark/sql/catalyst/expressions/postgreSQL/PostgreCastToTimestamp.scala
Outdated
Show resolved
Hide resolved
...main/scala/org/apache/spark/sql/catalyst/expressions/postgreSQL/PostgreCastToTimestamp.scala
Outdated
Show resolved
Hide resolved
...st/src/main/scala/org/apache/spark/sql/catalyst/expressions/postgreSQL/PostgreCastBase.scala
Outdated
Show resolved
Hide resolved
...st/src/main/scala/org/apache/spark/sql/catalyst/expressions/postgreSQL/PostgreCastBase.scala
Outdated
Show resolved
Hide resolved
...st/src/main/scala/org/apache/spark/sql/catalyst/expressions/postgreSQL/PostgreCastBase.scala
Show resolved
Hide resolved
|
@maropu updated as per review comments. |
|
@maropu Can you review the latest changes? |
...st/src/main/scala/org/apache/spark/sql/catalyst/expressions/postgreSQL/PostgreCastBase.scala
Outdated
Show resolved
Hide resolved
|
@maropu Kindly review. I have updated as per your suggestion. |
| if (conf.usePostgreSQLDialect) { | ||
| plan.transformExpressions { | ||
| case Cast(child, dataType, timeZoneId) | ||
| if dataType == TimestampType => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave cast timestamp to timestamp case to Optimizer to do optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry. Here, I mean that we should change the if condition to: if child.dataType != TimestampType && dataType == TimestampType =>. Because Optimizer, currently, can not optimize Pg cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got this. I will update this.
| override def sql: String = s"CAST(${child.sql} AS ${dataType.sql})" | ||
| override def castToTimestamp(from: DataType): Any => Any = from match { | ||
| case StringType => | ||
| buildCast[UTF8String](_, utfs => DateTimeUtils.stringToTimestamp(utfs, zoneId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that postgre could correctly parse string 19700101, 1970/01/01, January 1 04:05:06 1970 PST while spark can't. So, I think that we may also need to support it in PostgreCastToTimestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion. I will check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
postgres# select cast('19700101' as timestamp);
01.01.1970 00:00:00
postgres# select cast('1970/01/01' as timestamp);
01.01.1970 00:00:00
postgres# select cast('January 1 04:05:06 1970 PST' as timestamp);
01.01.1970 04:05:06
Spark results with NULL for all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu kindly review latest changes and give your feedback on supporting above queries.
Do we need to support them in this PR? If yes, we need to list all formats for timestamps which postgres supports but spark don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think that the support above is not a main issue of this pr, so better to separate the two work: the timestamp cast support and the timestamp format support for the pg dialect.
|
@maropu Kindly review this. Other PRs are depending on this. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PostgreSQLDialect.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PostgreSQLDialect.scala
Outdated
Show resolved
Hide resolved
| val expectedResult = s"cannot cast type ${value.getClass} to timestamp" | ||
| assert(actualResult.contains(expectedResult)) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move these tests to SQLQueryTestSuite,e.g., input/postgreSQL/cast.sql?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved these test cases. cast.sql.out needs to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to delete this test case.
| } | ||
|
|
||
| override def dataType: DataType = BooleanType | ||
| case class PostgreCastToTimestamp(child: Expression, timeZoneId: Option[String]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, we need to define a new rule and a new cast expr for each Pg cast pattern? I mean we cannot define all the Pg cast patterns in a single rule and a cast expr? cc: @cloud-fan @Ngone51
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can and should combine them into a single one(both rule and expression) when more types get in. Just like the original Cast does. But I'm not sure where shall we start. Maybe, this one ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, I personally think so. @cloud-fan
|
Jenkins, test this please. |
|
Hi @amanomer , have you addressed #26472 (comment) ? |
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/postgreSQL/CastSuite.scala
Outdated
Show resolved
Hide resolved
|
@Ngone51 I have updated this PR as per your reviews. Kindly review. |
|
Can you merge the current two rule into one as mentioned above? #26472 (comment) |
|
Yes. I will merge these two rules that are, |
|
@maropu Please check latest changes. Thanks |
|
IIUC we the community are planning to hold off the Pg dialect PRs for a while. (until 3.0 released?). plz check http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html cc: @cloud-fan |
|
Thank you for contribution, @amanomer . As you know, unfortunately, we decided to remove PostgreSQL dialect via SPARK-30125 (#26763). Sorry about that. I'll close this PR, too. |
What changes were proposed in this pull request?
To make SparkSQL's
cast as timestampbehavior consistent with PostgreSQL when spark.sql.dialect is configured as PostgreSQL.Why are the changes needed?
SparkSQL and PostgreSQL have a lot different cast behavior between types by default. We should make SparkSQL's cast behavior be consistent with PostgreSQL when spark.sql.dialect is configured as PostgreSQL.
Does this PR introduce any user-facing change?
Yes. If user switches to PostgreSQL dialect now, they will
StringTypeandDateType.How was this patch tested?
Added test cases.