-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32100][CORE][TESTS][FOLLOWUP] Reduce the required test resources #29001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #124962 has finished for PR 29001 at commit
|
|
Retest this please. |
|
Test build #124965 has finished for PR 29001 at commit
|
|
Retest this please. |
|
Test build #124976 has finished for PR 29001 at commit
|
|
Retest this please |
|
Retest this please. |
|
Test build #124980 has finished for PR 29001 at commit
|
|
retest this please |
|
Test build #124996 has finished for PR 29001 at commit
|
|
Thank you, @HyukjinKwon ! |
|
Retest this please. |
|
Test build #124998 has finished for PR 29001 at commit
|
| private val conf = new org.apache.spark.SparkConf() | ||
| .setAppName(getClass.getName) | ||
| .set(SPARK_MASTER, "local-cluster[20,1,512]") | ||
| .set(SPARK_MASTER, "local-cluster[10,1,512]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason we need 10 or 20 executors? It's still too many compares to other tests, whose average number should be 2 or 3. cc @holdenk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think this test came from the situation where we were experiencing a deadlock and we wanted to make sure we re-created the potential deadlock which happened when we decommissioned most of the executors. Now this deadlock never made it into OSS Spark, but having the test here to catch it just incase is good. I think we could catch the same deadlock with 5 executors and decommissioning 4 of them, but @dongjoon-hyun is the one who found this potential issue so I'll let him clarify :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your explanation.
|
LGTM, thanks for working around the crowded Jenkins env :) I think we can explore if we want to reduce it further in a follow on, but I think removing causes of test flakiness sooner rather than later is better for everyone working on Spark. |
### What changes were proposed in this pull request? This PR aims to disable dependency tests(test-dependencies.sh) from Jenkins. ### Why are the changes needed? - First of all, GitHub Action provides the same test capability already and stabler. - Second, currently, `test-dependencies.sh` fails very frequently in AmpLab Jenkins environment. For example, in the following irrelevant PR, it fails 5 times during 6 hours. - #29001 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the Jenkins without `test-dependencies.sh` invocation. Closes #29004 from dongjoon-hyun/SPARK-32178. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
|
+1, LGTM |
|
This is a test-only PR and I verified this manually. |
|
Merged to master. |
|
Test build #125010 has finished for PR 29001 at commit
|
This PR aims to reduce the required test resources in WorkerDecommissionExtendedSuite. When Jenkins farms is crowded, the following failure happens currently [here](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2-hive-2.3/890/testReport/junit/org.apache.spark.scheduler/WorkerDecommissionExtendedSuite/Worker_decommission_and_executor_idle_timeout/) ``` java.util.concurrent.TimeoutException: Can't find 20 executors before 60000 milliseconds elapsed at org.apache.spark.TestUtils$.waitUntilExecutorsUp(TestUtils.scala:326) at org.apache.spark.scheduler.WorkerDecommissionExtendedSuite.$anonfun$new$2(WorkerDecommissionExtendedSuite.scala:45) ``` No. Pass the Jenkins. Closes apache#29001 from dongjoon-hyun/SPARK-32100-2. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR aims to reduce the required test resources in WorkerDecommissionExtendedSuite.
Why are the changes needed?
When Jenkins farms is crowded, the following failure happens currently here
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass the Jenkins.