-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18950][SQL] Report conflicting fields when merging two StructTypes #16365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is actually the message users face in some cases. Isn't it :)? val df1 = spark.range(10).selectExpr("id as intcol", "cast(id as int) as longcol")
df1.write.parquet("/tmp/a")
val df2 = spark.range(10).selectExpr("id as intcol", "id as longcol")
df2.write.parquet("/tmp/b")
spark.read.option("mergeSchema", true).parquet("/tmp/a", "/tmp/b").show()Before After BTW, it looks the test in |
|
Thanks for the review @HyukjinKwon ! |
|
Doh, yeap, I just swapped back. I just simply meant I support this PR because this improves user's experience with a better message as well :). |
| dataType = dataType, | ||
| nullable = leftNullable || rightNullable) | ||
| case Failure(e) => | ||
| throw new SparkException(s"Failed to merge field $leftName: " + e.getMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$leftName -> '$leftName'
|
ok to test |
|
@bravo-zhang Sorry for the late response. Could you please also add a test case for capturing the new error message? |
|
Test build #77970 has started for PR 16365 at commit |
|
retest this please |
|
Test build #77999 has finished for PR 16365 at commit
|
|
ping @bravo-zhang for adding the test. |
|
Test build #79921 has finished for PR 16365 at commit
|
|
@HyukjinKwon @gatorsmile Test added. |
|
retest this please |
| leftField.copy( | ||
| dataType = merge(leftType, rightType), | ||
| nullable = leftNullable || rightNullable) | ||
| Try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the JAVA style try and catch? See the https://github.com/databricks/scala-style-guide#exception-handling-try-vs-try
| dataType = dataType, | ||
| nullable = leftNullable || rightNullable) | ||
| case Failure(e) => | ||
| throw new SparkException(s"Failed to merge field '$leftName': " + e.getMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we throw an AnalysisException with both sides, left and right? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other exceptions in this class are also SparkException, for example the precision conflicts. Should we keep it as SparkException?
For "with both sides, left and right", do you mean just to modify the message a bit to include both left and right names(though they are the same)?
@gatorsmile your other comments are resolved.
| left.merge(right) | ||
| } | ||
| }.getMessage | ||
| assert(message.contains("conflictColumn")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we capture the whole message? It can help us review the error message.
|
Test build #80064 has finished for PR 16365 at commit
|
|
Test build #80088 has finished for PR 16365 at commit
|
|
Test build #80090 has finished for PR 16365 at commit
|
|
LGTM Thanks! Merging to master. |
What changes were proposed in this pull request?
Currently, StructType.merge() only reports data types of conflicting fields when merging two incompatible schemas. It would be nice to also report the field names for easier debugging.
How was this patch tested?
Unit test in DataTypeSuite.
Print exception message when conflict is triggered.