-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned #28370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned #28370
Changes from 11 commits
ab0e38c
6a47615
622e1ba
d792092
b9906c2
076dd67
d12dbff
4c67660
f6b4f7c
9c6bdb6
bb324f9
12e865c
5847c1c
a2a81f6
3a14320
6ab11e3
c645582
37ad189
b365921
75d6daa
c343056
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -54,6 +54,7 @@ import org.apache.spark.rpc.RpcEnv | |
| import org.apache.spark.scheduler.ExecutorCacheTaskLocation | ||
| import org.apache.spark.serializer.{SerializerInstance, SerializerManager} | ||
| import org.apache.spark.shuffle.{ShuffleManager, ShuffleWriteMetricsReporter} | ||
| import org.apache.spark.storage.BlockManagerMessages.ReplicateBlock | ||
| import org.apache.spark.storage.memory._ | ||
| import org.apache.spark.unsafe.Platform | ||
| import org.apache.spark.util._ | ||
|
|
@@ -241,6 +242,9 @@ private[spark] class BlockManager( | |
|
|
||
| private var blockReplicationPolicy: BlockReplicationPolicy = _ | ||
|
|
||
| private var blockManagerDecommissioning: Boolean = false | ||
| private var decommissionManager: Option[BlockManagerDecommissionManager] = None | ||
|
|
||
| // A DownloadFileManager used to track all the files of remote blocks which are above the | ||
| // specified memory threshold. Files will be deleted automatically based on weak reference. | ||
| // Exposed for test | ||
|
|
@@ -1551,30 +1555,36 @@ private[spark] class BlockManager( | |
| } | ||
|
|
||
| /** | ||
| * Called for pro-active replenishment of blocks lost due to executor failures | ||
| * Replicates a block to peer block managers based on existingReplicas and maxReplicas | ||
| * | ||
| * @param blockId blockId being replicate | ||
| * @param existingReplicas existing block managers that have a replica | ||
| * @param maxReplicas maximum replicas needed | ||
| * @param maxReplicationFailures number of replication failures to tolerate before | ||
| * giving up. | ||
| * @return whether block was successfully replicated or not | ||
| */ | ||
| def replicateBlock( | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @prakharjain09 / @holdenk, Thank you for this improvement. I had a question please: (I am still new to this code paths and I am not totally sure of what I am talking about. So if there is something I am missing please help me fill the gaps :-). ) I notice that I understand it is a bit late to do the replication when the executor is indeed lost: Since decommissioning as implemented in #26440 does not really trigger eager executor loss. We instead merely stop scheduling on the decom'd executor and let it be shot down out of band. Which means that the replication triggered in SPARK-15355 would be too late. I like the approach taken in this PR to eagerly tell the executor (block-manager) to start replication when the decom is first initiated, to give it more time to be useful. But I wonder if you implemented this somewhat differently by leveraging the existing eager replication loop ?. Thanks !
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So the existing block replication is for the case where blocks we stored on two machines and due to executor loss are now down to one machine so they are replicated. It's not useless but it doesn't solve the same core problem. |
||
| blockId: BlockId, | ||
| existingReplicas: Set[BlockManagerId], | ||
| maxReplicas: Int): Unit = { | ||
| maxReplicas: Int, | ||
| maxReplicationFailures: Option[Int] = None): Boolean = { | ||
| logInfo(s"Using $blockManagerId to pro-actively replicate $blockId") | ||
| blockInfoManager.lockForReading(blockId).foreach { info => | ||
| blockInfoManager.lockForReading(blockId).forall { info => | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. |
||
| val data = doGetLocalBytes(blockId, info) | ||
| val storageLevel = StorageLevel( | ||
| useDisk = info.level.useDisk, | ||
| useMemory = info.level.useMemory, | ||
| useOffHeap = info.level.useOffHeap, | ||
| deserialized = info.level.deserialized, | ||
| replication = maxReplicas) | ||
| // we know we are called as a result of an executor removal, so we refresh peer cache | ||
| // this way, we won't try to replicate to a missing executor with a stale reference | ||
| // we know we are called as a result of an executor removal or because the current executor | ||
| // is getting decommissioned. so we refresh peer cache before trying replication, we won't | ||
| // try to replicate to a missing executor/another decommissioning executor | ||
prakharjain09 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| getPeers(forceFetch = true) | ||
| try { | ||
| replicate(blockId, data, storageLevel, info.classTag, existingReplicas) | ||
| replicate( | ||
| blockId, data, storageLevel, info.classTag, existingReplicas, maxReplicationFailures) | ||
| } finally { | ||
| logDebug(s"Releasing lock for $blockId") | ||
| releaseLockAndDispose(blockId, data) | ||
|
|
@@ -1591,9 +1601,11 @@ private[spark] class BlockManager( | |
| data: BlockData, | ||
| level: StorageLevel, | ||
| classTag: ClassTag[_], | ||
| existingReplicas: Set[BlockManagerId] = Set.empty): Unit = { | ||
| existingReplicas: Set[BlockManagerId] = Set.empty, | ||
| maxReplicationFailures: Option[Int] = None): Boolean = { | ||
|
|
||
| val maxReplicationFailures = conf.get(config.STORAGE_MAX_REPLICATION_FAILURE) | ||
| val maxReplicationFailureCount = maxReplicationFailures.getOrElse( | ||
| conf.get(config.STORAGE_MAX_REPLICATION_FAILURE)) | ||
| val tLevel = StorageLevel( | ||
| useDisk = level.useDisk, | ||
| useMemory = level.useMemory, | ||
|
|
@@ -1617,7 +1629,7 @@ private[spark] class BlockManager( | |
| blockId, | ||
| numPeersToReplicateTo) | ||
|
|
||
| while(numFailures <= maxReplicationFailures && | ||
| while(numFailures <= maxReplicationFailureCount && | ||
| !peersForReplication.isEmpty && | ||
| peersReplicatedTo.size < numPeersToReplicateTo) { | ||
| val peer = peersForReplication.head | ||
|
|
@@ -1665,9 +1677,11 @@ private[spark] class BlockManager( | |
| if (peersReplicatedTo.size < numPeersToReplicateTo) { | ||
| logWarning(s"Block $blockId replicated to only " + | ||
| s"${peersReplicatedTo.size} peer(s) instead of $numPeersToReplicateTo peers") | ||
| return false | ||
| } | ||
|
|
||
| logDebug(s"block $blockId replicated to ${peersReplicatedTo.mkString(", ")}") | ||
| return true | ||
| } | ||
|
|
||
| /** | ||
|
|
@@ -1761,6 +1775,58 @@ private[spark] class BlockManager( | |
| blocksToRemove.size | ||
| } | ||
|
|
||
| def decommissionBlockManager(): Unit = { | ||
| if (!blockManagerDecommissioning) { | ||
| logInfo("Starting block manager decommissioning process") | ||
| blockManagerDecommissioning = true | ||
| decommissionManager = Some(new BlockManagerDecommissionManager(conf)) | ||
| decommissionManager.foreach(_.start()) | ||
| } else { | ||
| logDebug("Block manager already in decommissioning state") | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Tries to offload all cached RDD blocks from this BlockManager to peer BlockManagers | ||
| * Visible for testing | ||
| */ | ||
| def decommissionRddCacheBlocks(): Unit = { | ||
prakharjain09 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| val replicateBlocksInfo = master.getReplicateInfoForRDDBlocks(blockManagerId) | ||
|
|
||
| if (replicateBlocksInfo.nonEmpty) { | ||
| logInfo(s"Need to replicate ${replicateBlocksInfo.size} blocks " + | ||
| "for block manager decommissioning") | ||
| } | ||
|
|
||
| // Maximum number of storage replication failure which replicateBlock can handle | ||
| val maxReplicationFailures = conf.get( | ||
| config.STORAGE_DECOMMISSION_MAX_REPLICATION_FAILURE_PER_BLOCK) | ||
|
|
||
| // TODO: We can sort these blocks based on some policy (LRU/blockSize etc) | ||
| // so that we end up prioritize them over each other | ||
| val blocksFailedReplication = ThreadUtils.parmap( | ||
prakharjain09 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| replicateBlocksInfo, "decommissionRddCacheBlocks", 4) { | ||
| case ReplicateBlock(blockId, existingReplicas, maxReplicas) => | ||
| val replicatedSuccessfully = replicateBlock( | ||
| blockId, | ||
| existingReplicas.toSet, | ||
| maxReplicas, | ||
| maxReplicationFailures = Some(maxReplicationFailures)) | ||
| if (replicatedSuccessfully) { | ||
| logInfo(s"Block $blockId offloaded successfully, Removing block now") | ||
| removeBlock(blockId) | ||
| logInfo(s"Block $blockId removed") | ||
| } else { | ||
| logWarning(s"Failed to offload block $blockId") | ||
| } | ||
| (blockId, replicatedSuccessfully) | ||
| }.filterNot(_._2).map(_._1) | ||
| if (blocksFailedReplication.nonEmpty) { | ||
| logWarning("Blocks failed replication in cache decommissioning " + | ||
| s"process: ${blocksFailedReplication.mkString(",")}") | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Remove all blocks belonging to the given broadcast. | ||
| */ | ||
|
|
@@ -1829,7 +1895,52 @@ private[spark] class BlockManager( | |
| data.dispose() | ||
| } | ||
|
|
||
| /** | ||
| * Class to handle block manager decommissioning retries | ||
| * It creates a Thread to retry offloading all RDD cache blocks | ||
| */ | ||
| private class BlockManagerDecommissionManager(conf: SparkConf) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we really need a wrapped manager class? It seems overkill to me.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the first part I'm ambivalent, but given that we also want to migrate shuffle blocks after I think having a manager is ok.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should implement it step by step and could always do refactor later. Or, we should at least add a todo ticket to explain why we need this and what we plan to do next. Otherwise, I am really -1 on this kind of change.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So there are two conversations I want to have about this with you @Ngone51 now to make sure I'm understanding what you're trying to express. There already is a second follow up PR that extends the I want to understand your -1 here because that has some pretty strong meanings in the context of a code change. A -1 is generally viewed as expressing a veto, which I don't believe you have in the project (of course I was out for a month in the hospital last year so if you do please let point me to thread). Even if you don't have a veto in the project is it your intention to say that if you did have a veto you would block this code change? A veto is generally a very strong expression, and I'm worried I'm not understanding your reasoning since this seems like a relatively minor issue.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I understand text-only communication can have more misunderstandings, if you want to find a time this week when we're both free to jump on a call to clarify this (and we can write back our understanding here so it's recorded for people to understand what we talked about), I'd be more than happy to.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For a new reviewer (e.g. me) on a big topic, it's not always possible to know every detail(even worse, when there's no design doc). So it's the author's responsibility to give more context. For example, leaving todo JIRA tickets in the code comment or reply to give more information. But without sufficient context here, I really think "this change", wrapping a manager around a thread, doesn't make sense to me. As for "-1", it really represents my personal opinion. I should say "I don't like this change" if "-1" means a lot for the community.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a reviewer it’s expected that you would read the issue before asking for a follow up issue in a blocking manner.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Of course, I did. But I still don't get it and I think it' not always possible that a reviewer could know the sub-issue is mean to be a follow up for some specific codes without design document/code comments around here.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So if you look at the parent issue you can see there is another sub issue that says migrate shuffle blocks. It’s ok to ask for a follow up even if there is one (we all miss things in reading), but attempt to vote a -1 has a higher bar than just asking for something. |
||
| @volatile private var stopped = false | ||
| private val blockReplicationThread = new Thread { | ||
| override def run(): Unit = { | ||
| while (blockManagerDecommissioning && !stopped) { | ||
prakharjain09 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| try { | ||
| logDebug("Attempting to replicate all cached RDD blocks") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we add attempt number to the log?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| decommissionRddCacheBlocks() | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't you need to set Or you mean we need to do multiple time
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't set |
||
| logInfo("Attempt to replicate all cached blocks done") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. attempt number?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I’d say fine to do in a follow up but if we want to add the attempt number here go for it (but I won’t hold off on merging for that). |
||
| val sleepInterval = conf.get( | ||
| config.STORAGE_DECOMMISSION_REPLICATION_REATTEMPT_INTERVAL) | ||
| Thread.sleep(sleepInterval) | ||
| } catch { | ||
| case _: InterruptedException => | ||
| // no-op | ||
| case NonFatal(e) => | ||
| logError("Error occurred while trying to " + | ||
| "replicate cached RDD blocks for block manager decommissioning", e) | ||
| } | ||
| } | ||
| } | ||
| } | ||
| blockReplicationThread.setDaemon(true) | ||
| blockReplicationThread.setName("block-replication-thread") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at our code we seem to be roughly split on
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We always use
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah..that's good point but just wondering how many of them are chosen after realizing BTW, you'd better grep "new Thread(" to exclude ThreadLocal declaration.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even that returns 36 in core. |
||
|
|
||
| def start(): Unit = { | ||
| logInfo("Starting block replication thread") | ||
| blockReplicationThread.start() | ||
| } | ||
|
|
||
| def stop(): Unit = { | ||
| if (!stopped) { | ||
| stopped = true | ||
| logInfo("Stopping block replication thread") | ||
| blockReplicationThread.interrupt() | ||
| blockReplicationThread.join() | ||
|
||
| } | ||
| } | ||
| } | ||
|
|
||
| def stop(): Unit = { | ||
| decommissionManager.foreach(_.stop()) | ||
| blockTransferService.close() | ||
| if (blockStoreClient ne blockTransferService) { | ||
| // Closing should be idempotent, but maybe not for the NioBlockTransferService. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -65,6 +65,9 @@ class BlockManagerMasterEndpoint( | |
| // Mapping from executor ID to block manager ID. | ||
| private val blockManagerIdByExecutor = new mutable.HashMap[String, BlockManagerId] | ||
|
|
||
| // Set of block managers which are decommissioning | ||
| private val decommissioningBlockManagerSet = new mutable.HashSet[BlockManagerId] | ||
|
|
||
| // Mapping from block id to the set of block managers that have the block. | ||
| private val blockLocations = new JHashMap[BlockId, mutable.HashSet[BlockManagerId]] | ||
|
|
||
|
|
@@ -153,6 +156,13 @@ class BlockManagerMasterEndpoint( | |
| removeExecutor(execId) | ||
| context.reply(true) | ||
|
|
||
| case DecommissionBlockManagers(executorIds) => | ||
| decommissionBlockManagers(executorIds.flatMap(blockManagerIdByExecutor.get)) | ||
| context.reply(true) | ||
|
|
||
| case GetReplicateInfoForRDDBlocks(blockManagerId) => | ||
| context.reply(getReplicateInfoForRDDBlocks(blockManagerId)) | ||
|
|
||
| case StopBlockManagerMaster => | ||
| context.reply(true) | ||
| stop() | ||
|
|
@@ -257,6 +267,7 @@ class BlockManagerMasterEndpoint( | |
|
|
||
| // Remove the block manager from blockManagerIdByExecutor. | ||
| blockManagerIdByExecutor -= blockManagerId.executorId | ||
| decommissioningBlockManagerSet.remove(blockManagerId) | ||
|
|
||
| // Remove it from blockManagerInfo and remove all the blocks. | ||
| blockManagerInfo.remove(blockManagerId) | ||
|
|
@@ -299,6 +310,39 @@ class BlockManagerMasterEndpoint( | |
| blockManagerIdByExecutor.get(execId).foreach(removeBlockManager) | ||
| } | ||
|
|
||
| /** | ||
| * Decommission the given Seq of blockmanagers | ||
| * - Adds these block managers to decommissioningBlockManagerSet Set | ||
| * - Sends the DecommissionBlockManager message to each of the [[BlockManagerSlaveEndpoint]] | ||
| */ | ||
| def decommissionBlockManagers(blockManagerIds: Seq[BlockManagerId]): Future[Seq[Unit]] = { | ||
| val newBlockManagersToDecommission = blockManagerIds.toSet.diff(decommissioningBlockManagerSet) | ||
| val futures = newBlockManagersToDecommission.map { blockManagerId => | ||
| decommissioningBlockManagerSet.add(blockManagerId) | ||
| val info = blockManagerInfo(blockManagerId) | ||
| info.slaveEndpoint.ask[Unit](DecommissionBlockManager) | ||
| } | ||
| Future.sequence{ futures.toSeq } | ||
| } | ||
|
|
||
| /** | ||
| * Returns a Seq of ReplicateBlock for each RDD block stored by given blockManagerId | ||
| * @param blockManagerId - block manager id for which ReplicateBlock info is needed | ||
| * @return Seq of ReplicateBlock | ||
| */ | ||
| private def getReplicateInfoForRDDBlocks(blockManagerId: BlockManagerId): Seq[ReplicateBlock] = { | ||
| val info = blockManagerInfo(blockManagerId) | ||
|
|
||
| val rddBlocks = info.blocks.keySet().asScala.filter(_.isRDD) | ||
prakharjain09 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| rddBlocks.map { blockId => | ||
| val currentBlockLocations = blockLocations.get(blockId) | ||
| val maxReplicas = currentBlockLocations.size + 1 | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you please add some comments to explain why we need "+1"?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the fact that we’re decommissioning here makes this self evident
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The method itself does not declare that it's used for decommissioning.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reasonable then to add a comment
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| val remainingLocations = currentBlockLocations.toSeq.filter(bm => bm != blockManagerId) | ||
| val replicateMsg = ReplicateBlock(blockId, remainingLocations, maxReplicas) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we make this an interface/trait that is implemented by some entity, which holistically decides the location for replication?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently the logic exists inside of blockReplicationPolicy so that would be the place to explore that.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems to be plugable, so if you wanted to specify your own policy you could specify
Comment on lines
+339
to
+341
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC, there's no need to do replication if
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So we currently remove the block on successful decommissioning. But it's possible it somehow is sufficiently replicated we don't need to do anything so I've added this to https://issues.apache.org/jira/browse/SPARK-31555 for tracking. |
||
| replicateMsg | ||
| }.toSeq | ||
| } | ||
|
|
||
| // Remove a block from the slaves that have it. This can only be used to remove | ||
| // blocks that the master knows about. | ||
| private def removeBlockFromWorkers(blockId: BlockId): Unit = { | ||
|
|
@@ -536,7 +580,11 @@ class BlockManagerMasterEndpoint( | |
| private def getPeers(blockManagerId: BlockManagerId): Seq[BlockManagerId] = { | ||
| val blockManagerIds = blockManagerInfo.keySet | ||
| if (blockManagerIds.contains(blockManagerId)) { | ||
| blockManagerIds.filterNot { _.isDriver }.filterNot { _ == blockManagerId }.toSeq | ||
| blockManagerIds | ||
| .filterNot { _.isDriver } | ||
| .filterNot { _ == blockManagerId } | ||
| .diff(decommissioningBlockManagerSet) | ||
| .toSeq | ||
| } else { | ||
| Seq.empty | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just use
spark.storage.maxReplicationFailuresdirectly. Less configurations contribute to better UX.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm not sure that's a great idea. Looking at
maxReplicationFailuresthe default is set to one, which certainly makes sense in the situation where we don't expect the host to be exiting. But this situation is different, we know the current block is going to disappear soon so it makes sense to more aggressively try and copy the block.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for your explanation.