-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11753][SQL][test-hadoop2.2] Make allowNonNumericNumbers option work #9759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
2777677
b2a835d
186fa5e
dc9abc3
6d90b24
c74d715
6f668c3
1cfd1dc
4fca52c
0e473fc
025edea
af1e3a1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -100,35 +100,13 @@ object JacksonParser { | |
| parser.getFloatValue | ||
|
|
||
| case (VALUE_STRING, FloatType) => | ||
| // Special case handling for NaN and Infinity. | ||
| val value = parser.getText | ||
| val lowerCaseValue = value.toLowerCase() | ||
| if (lowerCaseValue.equals("nan") || | ||
| lowerCaseValue.equals("infinity") || | ||
| lowerCaseValue.equals("-infinity") || | ||
| lowerCaseValue.equals("inf") || | ||
| lowerCaseValue.equals("-inf")) { | ||
| value.toFloat | ||
| } else { | ||
| sys.error(s"Cannot parse $value as FloatType.") | ||
| } | ||
| parser.getFloatValue | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why are we removing the special handling for float types here?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yea, should revert it back. BTW, do we actually test "inf" and "-inf" before? Because "inf".toFloat is not legal. |
||
|
|
||
| case (VALUE_NUMBER_INT | VALUE_NUMBER_FLOAT, DoubleType) => | ||
| parser.getDoubleValue | ||
|
|
||
| case (VALUE_STRING, DoubleType) => | ||
| // Special case handling for NaN and Infinity. | ||
| val value = parser.getText | ||
| val lowerCaseValue = value.toLowerCase() | ||
| if (lowerCaseValue.equals("nan") || | ||
| lowerCaseValue.equals("infinity") || | ||
| lowerCaseValue.equals("-infinity") || | ||
| lowerCaseValue.equals("inf") || | ||
| lowerCaseValue.equals("-inf")) { | ||
| value.toDouble | ||
| } else { | ||
| sys.error(s"Cannot parse $value as DoubleType.") | ||
| } | ||
| parser.getDoubleValue | ||
|
|
||
| case (VALUE_NUMBER_INT | VALUE_NUMBER_FLOAT, dt: DecimalType) => | ||
| Decimal(parser.getDecimalValue, dt.precision, dt.scale) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -93,22 +93,29 @@ class JsonParsingOptionsSuite extends QueryTest with SharedSQLContext { | |
| assert(df.first().getLong(0) == 18) | ||
| } | ||
|
|
||
| // The following two tests are not really working - need to look into Jackson's | ||
| // JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS. | ||
| ignore("allowNonNumericNumbers off") { | ||
| val str = """{"age": NaN}""" | ||
| val rdd = sqlContext.sparkContext.parallelize(Seq(str)) | ||
| val df = sqlContext.read.json(rdd) | ||
| test("allowNonNumericNumbers off") { | ||
| val testCases: Seq[String] = Seq("""{"age": NaN}""", """{"age": Infinity}""", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should add a test for quoted NaN, inf, etc. |
||
| """{"age": -Infinity}""") | ||
|
|
||
| assert(df.schema.head.name == "_corrupt_record") | ||
| testCases.foreach { str => | ||
| val rdd = sqlContext.sparkContext.parallelize(Seq(str)) | ||
| val df = sqlContext.read.option("allowNonNumericNumbers", "false").json(rdd) | ||
|
|
||
| assert(df.schema.head.name == "_corrupt_record") | ||
| } | ||
| } | ||
|
|
||
| ignore("allowNonNumericNumbers on") { | ||
| val str = """{"age": NaN}""" | ||
| val rdd = sqlContext.sparkContext.parallelize(Seq(str)) | ||
| val df = sqlContext.read.option("allowNonNumericNumbers", "true").json(rdd) | ||
| test("allowNonNumericNumbers on") { | ||
| val testCases: Seq[String] = Seq("""{"age": NaN}""", """{"age": Infinity}""", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we still read them if they are quoted?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, so if we don't set |
||
| """{"age": -Infinity}""") | ||
| val tests: Seq[Double => Boolean] = Seq(_.isNaN, _.isPosInfinity, _.isNegInfinity) | ||
|
|
||
| assert(df.schema.head.name == "age") | ||
| assert(df.first().getDouble(0).isNaN) | ||
| testCases.zipWithIndex.foreach { case (str, idx) => | ||
| val rdd = sqlContext.sparkContext.parallelize(Seq(str)) | ||
| val df = sqlContext.read.json(rdd) | ||
|
|
||
| assert(df.schema.head.name == "age") | ||
| assert(tests(idx)(df.first().getDouble(0))) | ||
| } | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd always write them in the same way to avoid heterogeneity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERScan only recognize the unquoted non numeric numbers. That is why it causes test cases failed above.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK then maybe we should still do the special case handling ourselves in order to support quoted strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if we still need to do the special case handling ourselves, so looks like we don't want to make
allowNonNumericNumbersas true by default?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but without setting it to true, can we process NaN that's not quoted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we can't.
Since we need to do special case handling for quoted non-numeric numbers, is
JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERSstill useful for us? Because no matter we use it or not, we all need to do some kind of handling.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to be able to process both quoted nans and nans and treat them as numeric values. I think in order to do that we need both special handling for quoted chars, and turn JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rxin I think it is here if I don't misunderstand it.