Skip to content

Conversation

@jiayue-zhang
Copy link
Contributor

What changes were proposed in this pull request?

Currently, StructType.merge() only reports data types of conflicting fields when merging two incompatible schemas. It would be nice to also report the field names for easier debugging.

How was this patch tested?

Unit test in DataTypeSuite.
Print exception message when conflict is triggered.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Dec 21, 2016

This is actually the message users face in some cases. Isn't it :)?

val df1 = spark.range(10).selectExpr("id as intcol", "cast(id as int) as longcol")
df1.write.parquet("/tmp/a")
val df2 = spark.range(10).selectExpr("id as intcol", "id as longcol")
df2.write.parquet("/tmp/b")
spark.read.option("mergeSchema", true).parquet("/tmp/a", "/tmp/b").show()

Before

...
Caused by: org.apache.spark.SparkException: Failed to merge incompatible data types IntegerType and LongType
  at org.apache.spark.sql.types.StructType$.merge(StructType.scala:515)
  at org.apache.spark.sql.types.StructType$$anonfun$merge$1$$anonfun$apply$3.apply(StructType.scala:473)
  at org.apache.spark.sql.types.StructType$$anonfun$merge$1$$anonfun$apply$3.apply(StructType.scala:471)
...

After

...
Caused by: org.apache.spark.SparkException: Failed to merge field longcol: Failed to merge incompatible data types IntegerType and LongType
  at org.apache.spark.sql.types.StructType$$anonfun$merge$1$$anonfun$apply$3.apply(StructType.scala:480)
  at org.apache.spark.sql.types.StructType$$anonfun$merge$1$$anonfun$apply$3.apply(StructType.scala:471)
  at scala.Option.map(Option.scala:146)
...

BTW, it looks the test in DataTypeSuite was not added by mistake.

@jiayue-zhang
Copy link
Contributor Author

Thanks for the review @HyukjinKwon !
Are your stacktrace of Before and After swapped?
Do you mean the message is confusing? How about correcting it to Failed to merge field longcol: incompatible data types IntegerType and LongType?
For the missing test case, do you mean a merge case that are not StructType as in your example?
Or is it something else that I'm missing?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Dec 21, 2016

Doh, yeap, I just swapped back. I just simply meant I support this PR because this improves user's experience with a better message as well :).

dataType = dataType,
nullable = leftNullable || rightNullable)
case Failure(e) =>
throw new SparkException(s"Failed to merge field $leftName: " + e.getMessage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$leftName -> '$leftName'

@gatorsmile
Copy link
Member

ok to test

@gatorsmile
Copy link
Member

@bravo-zhang Sorry for the late response. Could you please also add a test case for capturing the new error message?

@SparkQA
Copy link

SparkQA commented Jun 13, 2017

Test build #77970 has started for PR 16365 at commit 9bc9f8a.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jun 13, 2017

Test build #77999 has finished for PR 16365 at commit 9bc9f8a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

ping @bravo-zhang for adding the test.

@SparkQA
Copy link

SparkQA commented Jul 25, 2017

Test build #79921 has finished for PR 16365 at commit 54a898f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiayue-zhang
Copy link
Contributor Author

@HyukjinKwon @gatorsmile Test added.

@gatorsmile
Copy link
Member

retest this please

leftField.copy(
dataType = merge(leftType, rightType),
nullable = leftNullable || rightNullable)
Try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataType = dataType,
nullable = leftNullable || rightNullable)
case Failure(e) =>
throw new SparkException(s"Failed to merge field '$leftName': " + e.getMessage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we throw an AnalysisException with both sides, left and right? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other exceptions in this class are also SparkException, for example the precision conflicts. Should we keep it as SparkException?
For "with both sides, left and right", do you mean just to modify the message a bit to include both left and right names(though they are the same)?
@gatorsmile your other comments are resolved.

left.merge(right)
}
}.getMessage
assert(message.contains("conflictColumn"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we capture the whole message? It can help us review the error message.

@SparkQA
Copy link

SparkQA commented Jul 31, 2017

Test build #80064 has finished for PR 16365 at commit 54a898f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 31, 2017

Test build #80088 has finished for PR 16365 at commit f500a11.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 31, 2017

Test build #80090 has finished for PR 16365 at commit f4892c9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

gatorsmile commented Aug 1, 2017

LGTM

Thanks! Merging to master.

@asfgit asfgit closed this in 6b186c9 Aug 1, 2017
@jiayue-zhang jiayue-zhang deleted the spark-18950 branch August 2, 2017 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants