Commit 547f157
committed
[SPARK-28699][CORE] Fix a corner case for aborting indeterminate stage
Change the logic of collecting the indeterminate stage, we should look at stages from mapStage, not failedStage during handle FetchFailed.
In the fetch failed error handle logic, the original logic of collecting indeterminate stage from the fetch failed stage. And in the scenario of the fetch failed happened in the first task of this stage, this logic will cause the indeterminate stage to resubmit partially. Eventually, we are capable of getting correctness bug.
It makes the corner case of indeterminate stage abort as expected.
New UT in DAGSchedulerSuite.
Run below integrated test with `local-cluster[5, 2, 5120]`, and set `spark.sql.execution.sortBeforeRepartition`=false, it will abort the indeterminate stage as expected:
```
import scala.sys.process._
import org.apache.spark.TaskContext
val res = spark.range(0, 10000 * 10000, 1).map{ x => (x % 1000, x)}
// kill an executor in the stage that performs repartition(239)
val df = res.repartition(113).map{ x => (x._1 + 1, x._2)}.repartition(239).map { x =>
if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && TaskContext.get.stageAttemptNumber == 0) {
throw new Exception("pkill -f -n java".!!)
}
x
}
val r2 = df.distinct.count()
```
Closes apache#25498 from xuanyuanking/SPARK-28699-followup.
Authored-by: Yuanjian Li <xyliyuanjian@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 0d3a783)
Signed-off-by: Yuanjian Li <xyliyuanjian@gmail.com>1 parent 75076ff commit 547f157
2 files changed
Lines changed: 7 additions & 24 deletions
File tree
- core/src
- main/scala/org/apache/spark/scheduler
- test/scala/org/apache/spark/scheduler
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1505 | 1505 | | |
1506 | 1506 | | |
1507 | 1507 | | |
1508 | | - | |
| 1508 | + | |
1509 | 1509 | | |
1510 | 1510 | | |
1511 | | - | |
| 1511 | + | |
1512 | 1512 | | |
1513 | 1513 | | |
1514 | | - | |
| 1514 | + | |
1515 | 1515 | | |
1516 | 1516 | | |
1517 | 1517 | | |
| |||
Lines changed: 4 additions & 21 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2704 | 2704 | | |
2705 | 2705 | | |
2706 | 2706 | | |
2707 | | - | |
2708 | | - | |
2709 | | - | |
2710 | | - | |
2711 | | - | |
2712 | | - | |
2713 | | - | |
2714 | | - | |
2715 | | - | |
2716 | | - | |
2717 | | - | |
2718 | | - | |
2719 | | - | |
2720 | | - | |
2721 | | - | |
2722 | | - | |
2723 | | - | |
2724 | | - | |
2725 | | - | |
2726 | | - | |
2727 | | - | |
| 2707 | + | |
| 2708 | + | |
| 2709 | + | |
| 2710 | + | |
2728 | 2711 | | |
2729 | 2712 | | |
2730 | 2713 | | |
| |||
0 commit comments