Skip to content

Conversation

@sandeep-katta
Copy link
Contributor

@sandeep-katta sandeep-katta commented Mar 22, 2019

What changes were proposed in this pull request?

This PR proposes to add an assert on ScalarSubquery's dataType because there's a possibility that dataType can be called alone before throwing analysis exception.

This was found while working on SPARK-27088. This change calls treeString for logging purpose, and the specific test "scalar subquery with no column" under AnalysisErrorSuite was being failed with:

Caused by: sbt.ForkMain$ForkError: java.util.NoSuchElementException: next on empty iterator
	...
	at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:198)
	at org.apache.spark.sql.catalyst.expressions.ScalarSubquery.dataType(subquery.scala:251)
	at org.apache.spark.sql.catalyst.expressions.Alias.dataType(namedExpressions.scala:163)
        ...
	at org.apache.spark.sql.catalyst.trees.TreeNode.simpleString(TreeNode.scala:465)
        ...
	at org.apache.spark.sql.catalyst.rules.RuleExecutor$PlanChangeLogger.logRule(RuleExecutor.scala:176)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:116)
	...

The reason is that treeString for logging happened to call dataType on ScalarSubquery but one test has empty column plan. So, it happened to throw NoSuchElementException before checking analysis.

How was this patch tested?

Manually tested.

ScalarSubquery(LocalRelation()).treeString
An exception or error caused a run to abort: assertion failed: Scala subquery should have only one column 
java.lang.AssertionError: assertion failed: Scala subquery should have only one column
	at scala.Predef$.assert(Predef.scala:223)
	at org.apache.spark.sql.catalyst.expressions.ScalarSubquery.dataType(subquery.scala:252)
	at org.apache.spark.sql.catalyst.analysis.AnalysisErrorSuite.<init>(AnalysisErrorSuite.scala:116)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at java.lang.Class.newInstance(Class.java:442)
	at org.scalatest.tools.Runner$.genSuiteConfig(Runner.scala:1428)
	at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$8(Runner.scala:1236)
	at scala.collection.immutable.List.map(List.scala:286)
	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1235)

@sandeep-katta
Copy link
Contributor Author

sandeep-katta commented Mar 22, 2019

cc @maropu

this PR is required for 24136

@maropu
Copy link
Member

maropu commented Mar 22, 2019

What's the relationship between this and #24136?

@sandeep-katta
Copy link
Contributor Author

sandeep-katta commented Mar 22, 2019

"scalar subquery with no column" UT is failing

Reason: treeString is throwing exception , code here. Earlier this statement was in debug log, since debug is disabled this statement never executed in UT, now as per the new changes this treeString API is called so it is throwing exception.

@sandeep-katta
Copy link
Contributor Author

can refer the UT failure report Jenkins Report

@HyukjinKwon
Copy link
Member

ok to test

@sandeep-katta sandeep-katta changed the title [SPARK-27246][SQL]scalar subquery with no columns throws exception in spark shell [SPARK-27246][SQL]scalar subquery with no column test fails as it is throwing NoSuchElementException instead of AnalysisException Mar 25, 2019
@maropu
Copy link
Member

maropu commented Mar 25, 2019

Reason: treeString is throwing exception , code here. Earlier this statement was in debug log, since debug is disabled this statement never executed in UT, now as per the new changes this treeString API is called so it is throwing exception.

If the failure is caused by the change of #24136, we should put this fix there?

@sandeep-katta
Copy link
Contributor Author

Reason: treeString is throwing exception , code here. Earlier this statement was in debug log, since debug is disabled this statement never executed in UT, now as per the new changes this treeString API is called so it is throwing exception.

If the failure is caused by the change of #24136, we should put this fix there?

Failure is not caused by #24136 , it's an existing issue which recovered while fixing

@SparkQA
Copy link

SparkQA commented Mar 25, 2019

Test build #103882 has finished for PR 24182 at commit 6f07fb9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Mar 25, 2019

Ah, I see. Could you add the tests that don't depend on a log level?

@dilipbiswal
Copy link
Contributor

@sandeep-katta @maropu
IMHO removing this test and also asserting in the code on a 0 length schema is probably better.
We could replace the test to verify the assert exception + message instead ?
The reason i am thinking this way is "the NullType is not a valid return type of a scalar subquery expression". There is no way to get a user to type in a query that can exhibit this issue, right ?

@SparkQA
Copy link

SparkQA commented Mar 25, 2019

Test build #103884 has finished for PR 24182 at commit 20a2aaa.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Yea, adding an assert or throwing an exception should be better. Want to see if that's possible too. So wanted to check which test case is failed for what reason. Can you pull it out @sandeep-katta?

@sandeep-katta
Copy link
Contributor Author

sandeep-katta commented Mar 25, 2019

Failed test case Jenkins Report

Caused by: sbt.ForkMain$ForkError: java.util.NoSuchElementException: next on empty iterator
at scala.collection.Iterator$$anon$2.next(Iterator.scala:41)
at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
at scala.collection.IterableLike.head(IterableLike.scala:109)
at scala.collection.IterableLike.head$(IterableLike.scala:108)
at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$head(ArrayOps.scala:198)
at scala.collection.IndexedSeqOptimized.head(IndexedSeqOptimized.scala:129)
at scala.collection.IndexedSeqOptimized.head$(IndexedSeqOptimized.scala:129)
at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:198)
at org.apache.spark.sql.catalyst.expressions.ScalarSubquery.dataType(subquery.scala:251)

@dilipbiswal
Copy link
Contributor

dilipbiswal commented Mar 25, 2019

@HyukjinKwon From one of sandeep's comment i gathered that this is the test (in AnalysisErrorSuite) thats failing :

 errorTest(
    "scalar subquery with no column",
    testRelation.select(ScalarSubquery(LocalRelation()).as('a)),
    "Scalar subquery must return only one column, but got 0" :: Nil)

Here we are purposely setting up an ScalarSubquery expression op top of a relation with 0 length scema. If we run this test normally it passes. But if we turn on plan debugging it fails.
@sandeep-katta Please let me know if i got it right :-)

@HyukjinKwon
Copy link
Member

Thanks. Removing test and adding an assert looks better to me too.

@maropu
Copy link
Member

maropu commented Mar 25, 2019

SGTM, too

@dilipbiswal
Copy link
Contributor

Looks good to me.
cc @HyukjinKwon @maropu

@SparkQA
Copy link

SparkQA commented Mar 25, 2019

Test build #103902 has finished for PR 24182 at commit 726d864.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SubquerySuite extends SparkFunSuite

@HyukjinKwon HyukjinKwon changed the title [SPARK-27246][SQL]scalar subquery with no column test fails as it is throwing NoSuchElementException instead of AnalysisException [SPARK-27246][SQL] Add an assert on invalid Scalar subquery plan with no column Mar 25, 2019
@HyukjinKwon
Copy link
Member

HyukjinKwon commented Mar 25, 2019

LGTM too, otherwise. For PR description, I have updated by myself but let's keep the PR description detailed and more readable next time.

@SparkQA
Copy link

SparkQA commented Mar 25, 2019

Test build #103901 has finished for PR 24182 at commit 1223fd0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 25, 2019

Test build #103903 has finished for PR 24182 at commit 459914c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SubquerySuite extends SparkFunSuite

@SparkQA
Copy link

SparkQA commented Mar 25, 2019

Test build #103910 has finished for PR 24182 at commit 37ee7d6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants