Skip to content

Commit 628e184

Browse files
zsxwingzzcclp
authored andcommitted
[SPARK-17316][CORE] Make CoarseGrainedSchedulerBackend.removeExecutor non-blocking
## What changes were proposed in this pull request? StandaloneSchedulerBackend.executorRemoved is a blocking call right now. It may cause some deadlock since it's called inside StandaloneAppClient.ClientEndpoint. This PR just changed CoarseGrainedSchedulerBackend.removeExecutor to be non-blocking. It's safe since the only two usages (StandaloneSchedulerBackend and YarnSchedulerEndpoint) don't need the return value). ## How was this patch tested? Jenkins unit tests. Author: Shixiong Zhu <[email protected]> Closes apache#14882 from zsxwing/SPARK-17316. (cherry picked from commit b84a92c)
1 parent 24ebbf1 commit 628e184

File tree

1 file changed

+9
-8
lines changed

1 file changed

+9
-8
lines changed

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -372,14 +372,15 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp
372372
conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))
373373
}
374374

375-
// Called by subclasses when notified of a lost worker
376-
def removeExecutor(executorId: String, reason: ExecutorLossReason) {
377-
try {
378-
driverEndpoint.askWithRetry[Boolean](RemoveExecutor(executorId, reason))
379-
} catch {
380-
case e: Exception =>
381-
throw new SparkException("Error notifying standalone scheduler's driver endpoint", e)
382-
}
375+
/**
376+
* Called by subclasses when notified of a lost worker. It just fires the message and returns
377+
* at once.
378+
*/
379+
protected def removeExecutor(executorId: String, reason: ExecutorLossReason): Unit = {
380+
// Only log the failure since we don't care about the result.
381+
driverEndpoint.ask(RemoveExecutor(executorId, reason)).onFailure { case t =>
382+
logError(t.getMessage, t)
383+
}(ThreadUtils.sameThread)
383384
}
384385

385386
def sufficientResourcesRegistered(): Boolean = true

0 commit comments

Comments
 (0)