Skip to content

Commit c9bfd1c

Browse files
dongjoon-hyuncloud-fan
authored andcommitted
[SPARK-23489][SQL][TEST] HiveExternalCatalogVersionsSuite should verify the downloaded file
## What changes were proposed in this pull request? Although [SPARK-22654](https://issues.apache.org/jira/browse/SPARK-22654) made `HiveExternalCatalogVersionsSuite` download from Apache mirrors three times, it has been flaky because it didn't verify the downloaded file. Some Apache mirrors terminate the downloading abnormally, the *corrupted* file shows the following errors. ``` gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now 22:46:32.700 WARN org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite: ===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.hive.HiveExternalCatalogVersionsSuite, thread names: Keep-Alive-Timer ===== *** RUN ABORTED *** java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "/tmp/test-spark/spark-2.2.0"): error=2, No such file or directory ``` This has been reported weirdly in two ways. For example, the above case is reported as Case 2 `no failures`. - Case 1. [Test Result (1 failure / +1)](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/4389/) - Case 2. [Test Result (no failures)](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/4811/) This PR aims to make `HiveExternalCatalogVersionsSuite` more robust by verifying the downloaded `tgz` file by extracting and checking the existence of `bin/spark-submit`. If it turns out that the file is empty or corrupted, `HiveExternalCatalogVersionsSuite` will do retry logic like the download failure. ## How was this patch tested? Pass the Jenkins. Author: Dongjoon Hyun <[email protected]> Closes #21210 from dongjoon-hyun/SPARK-23489.
1 parent bf4352c commit c9bfd1c

1 file changed

Lines changed: 16 additions & 16 deletions

File tree

sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -67,28 +67,28 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
6767
logInfo(s"Downloading Spark $version from $url")
6868
try {
6969
getFileFromUrl(url, path, filename)
70-
return
70+
val downloaded = new File(sparkTestingDir, filename).getCanonicalPath
71+
val targetDir = new File(sparkTestingDir, s"spark-$version").getCanonicalPath
72+
73+
Seq("mkdir", targetDir).!
74+
val exitCode = Seq("tar", "-xzf", downloaded, "-C", targetDir, "--strip-components=1").!
75+
Seq("rm", downloaded).!
76+
77+
// For a corrupted file, `tar` returns non-zero values. However, we also need to check
78+
// the extracted file because `tar` returns 0 for empty file.
79+
val sparkSubmit = new File(sparkTestingDir, s"spark-$version/bin/spark-submit")
80+
if (exitCode == 0 && sparkSubmit.exists()) {
81+
return
82+
} else {
83+
Seq("rm", "-rf", targetDir).!
84+
}
7185
} catch {
7286
case ex: Exception => logWarning(s"Failed to download Spark $version from $url", ex)
7387
}
7488
}
7589
fail(s"Unable to download Spark $version")
7690
}
7791

78-
79-
private def downloadSpark(version: String): Unit = {
80-
tryDownloadSpark(version, sparkTestingDir.getCanonicalPath)
81-
82-
val downloaded = new File(sparkTestingDir, s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath
83-
val targetDir = new File(sparkTestingDir, s"spark-$version").getCanonicalPath
84-
85-
Seq("mkdir", targetDir).!
86-
87-
Seq("tar", "-xzf", downloaded, "-C", targetDir, "--strip-components=1").!
88-
89-
Seq("rm", downloaded).!
90-
}
91-
9292
private def genDataDir(name: String): String = {
9393
new File(tmpDataDir, name).getCanonicalPath
9494
}
@@ -161,7 +161,7 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
161161
PROCESS_TABLES.testingVersions.zipWithIndex.foreach { case (version, index) =>
162162
val sparkHome = new File(sparkTestingDir, s"spark-$version")
163163
if (!sparkHome.exists()) {
164-
downloadSpark(version)
164+
tryDownloadSpark(version, sparkTestingDir.getCanonicalPath)
165165
}
166166

167167
val args = Seq(

0 commit comments

Comments
 (0)