-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20980] [SQL] Rename wholeFile to multiLine for both CSV and JSON
#18202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
wholeFile to multiLinewholeFile to multiLine
|
Test build #77733 has started for PR 18202 at commit |
|
Retest this please. |
| parameters.getOrElse("timestampFormat", "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"), timeZone, Locale.US) | ||
|
|
||
| val wholeFile = parameters.get("wholeFile").map(_.toBoolean).getOrElse(false) | ||
| val multiLine = parameters.get("multiLine").map(_.toBoolean).getOrElse(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @gatorsmile .
It seems that we need to change JSONOptions.wholeFile together.
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala#L84
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is different. Each JSON file only can parse at most one record when wholeFile is on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rethinking the issue, we need to rename both CSV and JSON to multiLine and fix the JSON parsing to make them consistent.
wholeFile to multiLinewholeFile to multiLine for both CSV and JSON
|
Test build #77746 has finished for PR 18202 at commit
|
|
I think the change itself looks good as targeted to me (if this is going to be included in 2.2.0 - I just saw https://issues.apache.org/jira/browse/SPARK-20980?focusedCommentId=16037416&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16037416). Looks we just need a decision. Probably, please let me cc @rxin who I believe came up the option name initially. |
|
Wouldn't this break compatibility? |
|
Test build #77751 has finished for PR 18202 at commit
|
|
Test build #77752 has finished for PR 18202 at commit
|
|
Both options look added in 2.2 assuming from the JIRAs https://issues.apache.org/jira/browse/SPARK-19610 and https://issues.apache.org/jira/browse/SPARK-18352. If it targets 2.2.0, I guess It wouldn't. |
|
let's hold it until RC4 finishes. If RC4 passes, we need to update this PR to support the old option name, otherwise we can just rename. |
|
Hi all, it sounds RC4 vote was failed. Should we proceed this one? |
|
ah lucky :) merging to master/2.2! |
… JSON The current option name `wholeFile` is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal is `multiLine`. N/A Author: Xiao Li <[email protected]> Closes #18202 from gatorsmile/renameCVSOption. (cherry picked from commit 2051428) Signed-off-by: Wenchen Fan <[email protected]>
|
opened #18312 |
… JSON ### What changes were proposed in this pull request? The current option name `wholeFile` is misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal is `multiLine`. ### How was this patch tested? N/A Author: Xiao Li <[email protected]> Closes apache#18202 from gatorsmile/renameCVSOption.
What changes were proposed in this pull request?
The current option name
wholeFileis misleading for CSV users. Currently, it is not representing a record per file. Actually, one file could have multiple records. Thus, we should rename it. Now, the proposal ismultiLine.How was this patch tested?
N/A