-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31289][TEST][test-hive1.2] Eliminate org.apache.spark.sql.hive.thriftserver.CliSuite flakiness #28055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
retest this please |
1 similar comment
|
retest this please |
|
Test build #120497 has finished for PR 28055 at commit
|
|
Test build #120496 has finished for PR 28055 at commit
|
|
retest this please |
|
Test build #120500 has finished for PR 28055 at commit
|
|
retest this please |
1 similar comment
|
retest this please |
|
retest this please |
|
Test build #120509 has finished for PR 28055 at commit
|
|
Test build #120513 has finished for PR 28055 at commit
|
|
Test build #120512 has finished for PR 28055 at commit
|
|
Hi, @yaooqinn . Please use the following section to explain what is your proposal. The current content is proper at the next section, |
|
retest this please |
|
Test build #121777 has finished for PR 28055 at commit
|
|
retest this please |
1 similar comment
|
retest this please |
|
Test build #121779 has finished for PR 28055 at commit
|
|
retest this please |
|
Test build #121809 has finished for PR 28055 at commit
|
Do you know the reason? |
|
According to the error stack trace in the failed test, the test failed to instantiate a hive metastore client because of derby requirements. The derby requires the metastore dir does not exist, but it does exist probably due to the test case before it failed to clear the metastore dir |
|
In the latest master branch, I notice that we will wait for the suite process to be shutdown gracefully for 1 minute, which may reduce the flakiness of CliSuite but introduce more test time. |
|
I'm OK to have this patch to reduce test time, but we may need to update the PR description and do some experiments to prove this does reduce test time. |
|
I will run some tests blindly to see whether it introduce a significant delay, since I am not sure how to mock an ungrateful shutdown at the end of each test. |
|
Master vs this PR, run 2 times each, there seems half minute delay if there no error happens [info] CliSuite:
[info] - load warehouse dir from hive-site.xml (14 seconds, 416 milliseconds)
[info] - load warehouse dir from --hiveconf (12 seconds, 197 milliseconds)
[info] - load warehouse dir from --conf spark(.hadoop).hive.* (23 seconds, 489 milliseconds)
[info] - load warehouse dir from spark.sql.warehouse.dir (12 seconds, 148 milliseconds)
[info] - Simple commands (21 seconds, 337 milliseconds)
[info] - Single command with -e (11 seconds, 137 milliseconds)
[info] - Single command with --database (22 seconds, 780 milliseconds)
[info] - Commands using SerDe provided in --jars (17 seconds, 242 milliseconds)
[info] - SPARK-29022: Commands using SerDe provided in --hive.aux.jars.path (18 seconds, 253 milliseconds)
[info] - SPARK-11188 Analysis error reporting (11 seconds, 387 milliseconds)
[info] - SPARK-11624 Spark SQL CLI should set sessionState only once (9 seconds, 120 milliseconds)
[info] - list jars (11 seconds, 997 milliseconds)
[info] - list jar <jarfile> (12 seconds, 54 milliseconds)
[info] - list files (12 seconds, 142 milliseconds)
[info] - list file <filepath> (11 seconds, 461 milliseconds)
[info] - apply hiveconf from cli command (11 seconds, 657 milliseconds)
[info] - Support hive.aux.jars.path (17 seconds, 229 milliseconds)
[info] - SPARK-28840 test --jars command (13 seconds, 939 milliseconds)
[info] - SPARK-28840 test --jars and hive.aux.jars.path command (14 seconds, 198 milliseconds)
[info] - SPARK-29022 Commands using SerDe provided in ADD JAR sql (18 seconds, 716 milliseconds)
[info] - SPARK-26321 Should not split semicolon within quoted string literals (13 seconds, 477 milliseconds)
[info] - Pad Decimal numbers with trailing zeros to the scale of the column (13 seconds, 814 milliseconds)
[info] - SPARK-30049 Should not complain for quotes in commented lines (14 seconds, 641 milliseconds)
[info] - SPARK-30049 Should not complain for quotes in commented with multi-lines (26 seconds, 232 milliseconds)
[info] ScalaTest
[info] Run completed in 6 minutes, 8 seconds.
[info] Total number of tests run: 24
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 24, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 24, Failed 0, Errors 0, Passed 24
[success] Total time: 620 s, completed 2020-4-27 13:48:54
[info] CliSuite:
[info] - load warehouse dir from hive-site.xml (11 seconds, 917 milliseconds)
[info] - load warehouse dir from --hiveconf (12 seconds, 254 milliseconds)
[info] - load warehouse dir from --conf spark(.hadoop).hive.* (23 seconds, 285 milliseconds)
[info] - load warehouse dir from spark.sql.warehouse.dir (12 seconds, 139 milliseconds)
[info] - Simple commands (16 seconds, 821 milliseconds)
[info] - Single command with -e (11 seconds, 601 milliseconds)
[info] - Single command with --database (23 seconds, 393 milliseconds)
[info] - Commands using SerDe provided in --jars (16 seconds, 825 milliseconds)
[info] - SPARK-29022: Commands using SerDe provided in --hive.aux.jars.path (16 seconds, 525 milliseconds)
[info] - SPARK-11188 Analysis error reporting (9 seconds, 934 milliseconds)
[info] - SPARK-11624 Spark SQL CLI should set sessionState only once (9 seconds, 79 milliseconds)
[info] - list jars (11 seconds, 576 milliseconds)
[info] - list jar <jarfile> (11 seconds, 989 milliseconds)
[info] - list files (11 seconds, 907 milliseconds)
[info] - list file <filepath> (12 seconds, 115 milliseconds)
[info] - apply hiveconf from cli command (11 seconds, 520 milliseconds)
[info] - Support hive.aux.jars.path (13 seconds, 159 milliseconds)
[info] - SPARK-28840 test --jars command (12 seconds, 820 milliseconds)
[info] - SPARK-28840 test --jars and hive.aux.jars.path command (13 seconds, 326 milliseconds)
[info] - SPARK-29022 Commands using SerDe provided in ADD JAR sql (15 seconds, 459 milliseconds)
[info] - SPARK-26321 Should not split semicolon within quoted string literals (12 seconds, 574 milliseconds)
[info] - Pad Decimal numbers with trailing zeros to the scale of the column (12 seconds, 813 milliseconds)
[info] - SPARK-30049 Should not complain for quotes in commented lines (12 seconds, 501 milliseconds)
[info] - SPARK-30049 Should not complain for quotes in commented with multi-lines (24 seconds, 560 milliseconds)
[info] ScalaTest
[info] Run completed in 5 minutes, 42 seconds.
[info] Total number of tests run: 24
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 24, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 24, Failed 0, Errors 0, Passed 24
[success] Total time: 370 s, completed 2020-4-27 13:56:31[info] CliSuite:
[info] - load warehouse dir from hive-site.xml (16 seconds, 470 milliseconds)
[info] - load warehouse dir from --hiveconf (12 seconds, 245 milliseconds)
[info] - load warehouse dir from --conf spark(.hadoop).hive.* (23 seconds, 133 milliseconds)
[info] - load warehouse dir from spark.sql.warehouse.dir (11 seconds, 622 milliseconds)
[info] - Simple commands (15 seconds, 746 milliseconds)
[info] - Single command with -e (1 second, 18 milliseconds)
[info] - Single command with --database (16 seconds, 449 milliseconds)
[info] - Commands using SerDe provided in --jars (15 seconds, 69 milliseconds)
[info] - SPARK-29022: Commands using SerDe provided in --hive.aux.jars.path (14 seconds, 257 milliseconds)
[info] - SPARK-11188 Analysis error reporting (8 seconds, 882 milliseconds)
[info] - SPARK-11624 Spark SQL CLI should set sessionState only once (7 seconds, 894 milliseconds)
[info] - list jars (10 seconds, 76 milliseconds)
[info] - list jar <jarfile> (10 seconds, 64 milliseconds)
[info] - list files (8 seconds, 371 milliseconds)
[info] - list file <filepath> (8 seconds, 677 milliseconds)
[info] - apply hiveconf from cli command (10 seconds, 610 milliseconds)
[info] - Support hive.aux.jars.path (12 seconds, 422 milliseconds)
[info] - SPARK-28840 test --jars command (11 seconds, 970 milliseconds)
[info] - SPARK-28840 test --jars and hive.aux.jars.path command (12 seconds, 642 milliseconds)
[info] - SPARK-29022 Commands using SerDe provided in ADD JAR sql (15 seconds, 971 milliseconds)
[info] - SPARK-26321 Should not split semicolon within quoted string literals (13 seconds, 659 milliseconds)
[info] - Pad Decimal numbers with trailing zeros to the scale of the column (11 seconds, 810 milliseconds)
[info] - SPARK-30049 Should not complain for quotes in commented lines (11 seconds, 466 milliseconds)
[info] - SPARK-30049 Should not complain for quotes in commented with multi-lines (23 seconds, 413 milliseconds)
[info] ScalaTest
[info] Run completed in 5 minutes, 6 seconds.
[info] Total number of tests run: 24
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 24, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 24, Failed 0, Errors 0, Passed 24
[success] Total time: 563 s, completed 2020-4-27 14:07:39
[info] CliSuite:
[info] - load warehouse dir from hive-site.xml (11 seconds, 901 milliseconds)
[info] - load warehouse dir from --hiveconf (11 seconds, 803 milliseconds)
[info] - load warehouse dir from --conf spark(.hadoop).hive.* (28 seconds, 325 milliseconds)
[info] - load warehouse dir from spark.sql.warehouse.dir (12 seconds, 106 milliseconds)
[info] - Simple commands (12 seconds, 178 milliseconds)
[info] - Single command with -e (1 second, 19 milliseconds)
[info] - Single command with --database (16 seconds, 795 milliseconds)
[info] - Commands using SerDe provided in --jars (15 seconds, 255 milliseconds)
[info] - SPARK-29022: Commands using SerDe provided in --hive.aux.jars.path (14 seconds, 817 milliseconds)
[info] - SPARK-11188 Analysis error reporting (9 seconds, 60 milliseconds)
[info] - SPARK-11624 Spark SQL CLI should set sessionState only once (10 seconds, 225 milliseconds)
[info] - list jars (11 seconds, 240 milliseconds)
[info] - list jar <jarfile> (10 seconds, 250 milliseconds)
[info] - list files (8 seconds, 521 milliseconds)
[info] - list file <filepath> (8 seconds, 451 milliseconds)
[info] - apply hiveconf from cli command (10 seconds, 679 milliseconds)
[info] - Support hive.aux.jars.path (12 seconds, 246 milliseconds)
[info] - SPARK-28840 test --jars command (11 seconds, 850 milliseconds)
[info] - SPARK-28840 test --jars and hive.aux.jars.path command (12 seconds, 399 milliseconds)
[info] - SPARK-29022 Commands using SerDe provided in ADD JAR sql (14 seconds, 877 milliseconds)
[info] - SPARK-26321 Should not split semicolon within quoted string literals (11 seconds, 638 milliseconds)
[info] - Pad Decimal numbers with trailing zeros to the scale of the column (11 seconds, 433 milliseconds)
[info] - SPARK-30049 Should not complain for quotes in commented lines (11 seconds, 534 milliseconds)
[info] - SPARK-30049 Should not complain for quotes in commented with multi-lines (23 seconds, 30 milliseconds)
[info] ScalaTest
[info] Run completed in 5 minutes, 3 seconds.
[info] Total number of tests run: 24
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 24, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 24, Failed 0, Errors 0, Passed 24
[success] Total time: 331 s, completed 2020-4-27 14:13:28 |
|
so you mean this PR can speed up the |
|
The results only indicate that a shared metastore environment can speed up the test about 30 seconds when there is no error to block each test process to shutdown itself. There might be more cost if error occurs. |
|
I set the metastore to temporarily unwritable(But I don't know whether this is the exact error happened on the Jenkins), the tests can be delayed for a long period |
|
The bellow test failed naturally on my local machine without manual intervention. So, the flakiness is still there in the master branch with a lower opportunity. [info] - list jars *** FAILED *** (2 minutes, 59 seconds)
[info] spark-sql did not exit gracefully. (CliSuite.scala:196)
[info] org.scalatest.exceptions.TestFailedException:
[info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
[info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
[info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
[info] at org.scalatest.Assertions.fail(Assertions.scala:1091)
[info] at org.scalatest.Assertions.fail$(Assertions.scala:1087)
[info] at org.scalatest.FunSuite.fail(FunSuite.scala:1560)
[info] at org.apache.spark.sql.hive.thriftserver.CliSuite.runCliWithin(CliSuite.scala:196)
[info] at org.apache.spark.sql.hive.thriftserver.CliSuite.$anonfun$new$12(CliSuite.scala:363)
[info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info] at org.scalatest.Transformer.apply(Transformer.scala:22)
[info] at org.scalatest.Transformer.apply(Transformer.scala:20)
[info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
[info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:151)
[info] at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
[info] at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:286)
[info] at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
[info] at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
[info] at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:58)
[info] at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
[info] at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
[info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:58)
[info] at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
[info] at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:393)
[info] at scala.collection.immutable.List.foreach(List.scala:392)
[info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:381)
[info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:376)
[info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:458)
[info] at org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229)
[info] at org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228)
[info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
[info] at org.scalatest.Suite.run(Suite.scala:1124)
[info] at org.scalatest.Suite.run$(Suite.scala:1106)
[info] at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
[info] at org.scalatest.FunSuiteLike.$anonfun$run$1(FunSuiteLike.scala:233)
[info] at org.scalatest.SuperEngine.runImpl(Engine.scala:518)
[info] at org.scalatest.FunSuiteLike.run(FunSuiteLike.scala:233)
[info] at org.scalatest.FunSuiteLike.run$(FunSuiteLike.scala:232)
[info] at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:58)
[info] at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
[info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info] at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
[info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
[info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
[info] at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info] at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info] at java.lang.Thread.run(Thread.java:748) |
|
retest this please |
1 similar comment
|
retest this please |
|
Test build #122072 has finished for PR 28055 at commit
|
| maybeWarehouse: Option[File] = Some(warehousePath), | ||
| useExternalHiveFile: Boolean = false)( | ||
| useExternalHiveFile: Boolean = false, | ||
| maybeMetastore: Option[File] = Some(metastorePath))( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about metastore: File = metastorePath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, right
|
Test build #122078 has finished for PR 28055 at commit
|
|
Test build #122079 has finished for PR 28055 at commit
|
|
retest this please |
|
Test build #122689 has finished for PR 28055 at commit
|
| } | ||
| val warehouseConf = | ||
| maybeWarehouse.map(dir => s"--hiveconf ${ConfVars.METASTOREWAREHOUSE}=$dir").getOrElse("") | ||
| // whether to use a separated derby metastore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this comment mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
er. I forget to delete this line.
| "--conf", s"spark.hadoop.${ConfVars.METASTOREWAREHOUSE}=${sparkWareHouseDir}2"), | ||
| metastore = metastore)( | ||
| "desc database default;" -> sparkWareHouseDir.getAbsolutePath.concat("1")) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we miss a finally clause?
|
Test build #122716 has finished for PR 28055 at commit
|
|
thanks, merging to master/3.0! |
….thriftserver.CliSuite flakiness ### What changes were proposed in this pull request? CliSuite seems to be flaky while using metastoreDir per test. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120470/testReport/org.apache.spark.sql.hive.thriftserver/CliSuite/ https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120470/testReport/junit/org.apache.spark.sql.hive.thriftserver/CliSuite/history/ According to the error stack trace in the failed test, the test failed to instantiate a hive metastore client because of derby requirements. ```scala Caused by: ERROR XBM0A: The database directory '/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-9249ce52-0a06-42b6-a3df-e6295e880df0' exists. However, it does not contain the expected 'service.properties' file. Perhaps Derby was brought down in the middle of creating this database. You may want to delete this directory and try creating the database again. ``` The derby requires the metastore dir does not exist, but it does exist probably due to the test case before it failed to clear the metastore dir In this PR, the metastore is shared across the tests of CliSuite except those explicitly asked a separated metastore env itself ### Why are the changes needed? CliSuite seems to be flaky while using metastoreDir per test. To eliminate test flakiness ### Does this PR introduce any user-facing change? no ### How was this patch tested? modified test Closes #28055 from yaooqinn/clisuite. Authored-by: Kent Yao <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 1d66085) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
CliSuite seems to be flaky while using metastoreDir per test.
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120470/testReport/org.apache.spark.sql.hive.thriftserver/CliSuite/
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120470/testReport/junit/org.apache.spark.sql.hive.thriftserver/CliSuite/history/
According to the error stack trace in the failed test, the test failed to instantiate a hive metastore client because of derby requirements.
The derby requires the metastore dir does not exist, but it does exist probably due to the test case before it failed to clear the metastore dir
In this PR, the metastore is shared across the tests of CliSuite except those explicitly asked a separated metastore env itself
Why are the changes needed?
CliSuite seems to be flaky while using metastoreDir per test.
To eliminate test flakiness
Does this PR introduce any user-facing change?
no
How was this patch tested?
modified test