-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29508][SQL] Implicitly cast strings in datetime arithmetic operations #26165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@wangyum Please, review this PR. |
|
Test build #112279 has finished for PR 26165 at commit
|
|
I tried to add this feature before: #25190 |
Why was it closed? This casting is useful independently from PostgreSQL. I believe this improves user experience with Spark SQL. Don't think it is better to fail on casting a string to |
|
Test build #112282 has finished for PR 26165 at commit
|
|
Test build #112290 has finished for PR 26165 at commit
|
|
@MaxGekk Please see @maropu's comment: |
|
Can we consider the feature out of the scope of compatibility with PostgreSQL? |
|
@srowen @cloud-fan @HyukjinKwon WDYT? |
|
I'm just trying to think if this causes any surprising behavior, either because some string is unintentionally cast to a date or interval type, or because it doesn't fully work in all cases (see the other ticket). How much do we need to support this? If it's not a standard behavior across DBs, I'd kind of prefer to have people write more explicit casts for such a thing. Do we cast strings to other types implicitly though? |
|
Is it really a useful feature? Users can simply put the For |
You are right, users can put it before literals but what about string column. Users have to explicitly cast it, correct?
Yeh, though the same argument is applicable to other implicit cast from strings, for example: spark-sql> select 2 * '3';
6.0Why does Spark implicitly cast Let's consider proposed implicit casts one by one:
maxim=# select date'today' - '2019-10-01';
?column?
----------
21
(1 row) |
|
Yea, I'm still worried that this feature might cause unexpected results in complicated SQL queries as others suggested..., btw, any dbms-like systems support this implicit casts? |
To be honest I think this is a mistake as well. This kind of implicit cast is really risky as Spark returns null for invalid cast. Even if invalid cast can throw runtime exception, we should still not allow this kind of implicit cast to be safe. FYI this is the result of pgsql AFAIK pgsql(and some other DBs) only apply this kind of implicit cast for string literal. We'd either update the type coercion module to handle string literal specially, or not do it at all. |
|
I am closing this PR. |
What changes were proposed in this pull request?
interval - string. For example:datetime + stringorstring + datetime. For example:datetime - stringorstring - datetime. For example:Why are the changes needed?
To improve user experience with Spark SQL
Does this PR introduce any user-facing change?
Yes, previously the operations fails with the errors:
How was this patch tested?
TypeCoercionSuiteto check rulesdateTimeOperations.sql