-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23433][SPARK-25250] [CORE] Later created TaskSet should learn about the finished partitions #23871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23433][SPARK-25250] [CORE] Later created TaskSet should learn about the finished partitions #23871
Changes from 4 commits
f132194
ee1f5df
76bb765
c53961f
0b41d86
19550a4
0f46e11
bedb4b7
259f5ce
989c9d3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,7 +21,7 @@ import java.io.NotSerializableException | |
| import java.nio.ByteBuffer | ||
| import java.util.concurrent.ConcurrentLinkedQueue | ||
|
|
||
| import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet} | ||
| import scala.collection.mutable.{ArrayBuffer, BitSet, HashMap, HashSet} | ||
| import scala.math.max | ||
| import scala.util.control.NonFatal | ||
|
|
||
|
|
@@ -189,6 +189,18 @@ private[spark] class TaskSetManager( | |
| addPendingTask(i) | ||
| } | ||
|
|
||
| { | ||
| // TaskSet got submitted by DAGScheduler may have some already completed | ||
|
||
| // tasks since DAGScheduler does not always know all the tasks that have | ||
| // been completed by other tasksets when completing a stage, so we mark | ||
| // those tasks as finished here to avoid launching duplicate tasks, while | ||
| // holding the TaskSchedulerImpl lock. | ||
| // See SPARK-25250 and markPartitionCompletedInAllTaskSets()` | ||
| sched.stageIdToFinishedPartitions.get(taskSet.stageId).foreach { | ||
| finishedPartitions => finishedPartitions.foreach(markPartitionCompleted(_, None)) | ||
|
||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Track the set of locality levels which are valid given the tasks locality preferences and | ||
| * the set of currently available executors. This is updated as executors are added and removed. | ||
|
|
@@ -797,11 +809,12 @@ private[spark] class TaskSetManager( | |
| maybeFinishTaskSet() | ||
| } | ||
|
|
||
| private[scheduler] def markPartitionCompleted(partitionId: Int, taskInfo: TaskInfo): Unit = { | ||
| private[scheduler] def markPartitionCompleted(partitionId: Int, taskInfo: Option[TaskInfo]) | ||
| : Unit = { | ||
|
||
| partitionToIndex.get(partitionId).foreach { index => | ||
| if (!successful(index)) { | ||
| if (speculationEnabled && !isZombie) { | ||
| successfulTaskDurations.insert(taskInfo.duration) | ||
| taskInfo.foreach { info => successfulTaskDurations.insert(info.duration) } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is existing logic, but I have a question here: Do we really need to do it? The task is finished by another TSM, it seems unreasonable to update the statistics for launching speculative tasks in this TSM.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you're right. What's your opinion ? @squito
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah there was discussion about this in the past, there are arguments for doing it multiple ways. This was kind of a compromise with something that avoided a bug and was a reasonable change to put in, see more here: #21656
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. |
||
| } | ||
| tasksSuccessful += 1 | ||
| successful(index) = true | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1102,7 +1102,7 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B | |
| } | ||
| } | ||
|
|
||
| test("Completions in zombie tasksets update status of non-zombie taskset") { | ||
| test("SPARK-23433/25250 Completions in zombie tasksets update status of non-zombie taskset") { | ||
| val taskScheduler = setupSchedulerWithMockTaskSetBlacklist() | ||
| val valueSer = SparkEnv.get.serializer.newInstance() | ||
|
|
||
|
|
@@ -1114,9 +1114,9 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B | |
| } | ||
|
|
||
| // Submit a task set, have it fail with a fetch failed, and then re-submit the task attempt, | ||
| // two times, so we have three active task sets for one stage. (For this to really happen, | ||
| // you'd need the previous stage to also get restarted, and then succeed, in between each | ||
| // attempt, but that happens outside what we're mocking here.) | ||
| // two times, so we have three TaskSetManagers(2 zombie, 1 active) for one stage. (For this | ||
| // to really happen, you'd need the previous stage to also get restarted, and then succeed, | ||
| // in between each attempt, but that happens outside what we're mocking here.) | ||
| val zombieAttempts = (0 until 2).map { stageAttempt => | ||
| val attempt = FakeTask.createTaskSet(10, stageAttemptId = stageAttempt) | ||
| taskScheduler.submitTasks(attempt) | ||
|
|
@@ -1133,30 +1133,51 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B | |
| assert(tsm.runningTasks === 9) | ||
| tsm | ||
| } | ||
| // we've now got 2 zombie attempts, each with 9 tasks still active but zero active attempt | ||
| // in taskScheduler. | ||
|
||
|
|
||
| // finish partition 1,2 by completing the tasks before a new attempt for the same stage submit. | ||
| // And it's possible since the behaviour of submitting new attempt and handling successful task | ||
|
||
| // is from two different threads, which are "task-result-getter" and "dag-scheduler-event-loop" | ||
| // separately. | ||
| (0 until 2).foreach { i => | ||
| completeTaskSuccessfully(zombieAttempts(i), i + 1) | ||
| assert(taskScheduler.stageIdToFinishedPartitions(0).contains(i + 1)) | ||
| } | ||
|
|
||
| // we've now got 2 zombie attempts, each with 9 tasks still active. Submit the 3rd attempt for | ||
| // the stage, but this time with insufficient resources so not all tasks are active. | ||
|
|
||
| // Submit the 3rd attempt still with 10 tasks, this happens due to the race between thread | ||
| // "task-result-getter" and "dag-scheduler-event-loop", where a TaskSet gets submitted with | ||
| // already completed tasks. And this time with insufficient resources so not all tasks are | ||
| // active. | ||
| val finalAttempt = FakeTask.createTaskSet(10, stageAttemptId = 2) | ||
| taskScheduler.submitTasks(finalAttempt) | ||
| val finalTsm = taskScheduler.taskSetManagerForAttempt(0, 2).get | ||
| // Though, finalTsm gets submitted after some tasks succeeds, but it could also know about the | ||
| // finished partition by looking into `stageIdToFinishedPartitions` when it is being created, | ||
| // so that it won't launch any duplicate tasks later. | ||
|
||
| (0 until 2).map(_ + 1).foreach { partitionId => | ||
| val index = finalTsm.partitionToIndex(partitionId) | ||
| assert(finalTsm.successful(index)) | ||
| } | ||
|
|
||
| val offers = (0 until 5).map{ idx => WorkerOffer(s"exec-$idx", s"host-$idx", 1) } | ||
| val finalAttemptLaunchedPartitions = taskScheduler.resourceOffers(offers).flatten.map { task => | ||
| finalAttempt.tasks(task.index).partitionId | ||
| }.toSet | ||
| assert(finalTsm.runningTasks === 5) | ||
| assert(!finalTsm.isZombie) | ||
|
|
||
| // We simulate late completions from our zombie tasksets, corresponding to all the pending | ||
| // partitions in our final attempt. This means we're only waiting on the tasks we've already | ||
| // launched. | ||
| // We continually simulate late completions from our zombie tasksets(but this time, there's one | ||
| // active attempt exists in taskScheduler), corresponding to all the pending partitions in our | ||
| // final attempt. This means we're only waiting on the tasks we've already launched. | ||
| val finalAttemptPendingPartitions = (0 until 10).toSet.diff(finalAttemptLaunchedPartitions) | ||
| finalAttemptPendingPartitions.foreach { partition => | ||
| completeTaskSuccessfully(zombieAttempts(0), partition) | ||
| assert(taskScheduler.stageIdToFinishedPartitions(0).contains(partition)) | ||
| } | ||
|
|
||
| // If there is another resource offer, we shouldn't run anything. Though our final attempt | ||
| // used to have pending tasks, now those tasks have been completed by zombie attempts. The | ||
| // used to have pending tasks, now those tasks have been completed by zombie attempts. The | ||
| // remaining tasks to compute are already active in the non-zombie attempt. | ||
| assert( | ||
| taskScheduler.resourceOffers(IndexedSeq(WorkerOffer("exec-1", "host-1", 1))).flatten.isEmpty) | ||
|
|
@@ -1179,6 +1200,7 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B | |
| zombieAttempts(partition % 2) | ||
| } | ||
| completeTaskSuccessfully(tsm, partition) | ||
| assert(taskScheduler.stageIdToFinishedPartitions(0).contains(partition)) | ||
| } | ||
|
|
||
| assert(finalTsm.isZombie) | ||
|
|
@@ -1204,6 +1226,7 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B | |
| // perspective, as the failures weren't from a problem w/ the tasks themselves. | ||
| verify(blacklist).updateBlacklistForSuccessfulTaskSet(meq(0), meq(stageAttempt), any()) | ||
| } | ||
| assert(taskScheduler.stageIdToFinishedPartitions.isEmpty) | ||
| } | ||
|
|
||
| test("don't schedule for a barrier taskSet if available slots are less than pending tasks") { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add this code when calling
killTasks?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find method
killTasksin TaskSchedulerImpl, and for some similar func, e.g.cancelTasks, I think it's unnecessary.