Restore incrementEpoch() call.

JoshRosen · JoshRosen · commit 4550f616a4f9 · 2017-06-05T13:41:42.000-07:00
diff --git a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala
@@ -532,7 +532,7 @@ private[spark] class MapOutputTrackerMaster(
     None
   }
 
-  private def incrementEpoch() {
+  def incrementEpoch() {
     epochLock.synchronized {
       epoch += 1
       logDebug("Increasing epoch to " + epoch)
diff --git a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
@@ -1224,6 +1224,15 @@ class DAGScheduler(
               logInfo("waiting: " + waitingStages)
               logInfo("failed: " + failedStages)
 
+              // This call to increment the epoch may not be strictly necessary, but it is retained
+              // for now in order to minimize the changes in behavior from an earlier version of the
+              // code. This existing behavior of always incrementing the epoch following any
+              // successful shuffle map stage completion may have benefits by causing unneeded
+              // cached map outputs to be cleaned up earlier on executors. In the future we can
+              // consider removing this call, but this will require some extra investigation.
+              // See https://github.com/apache/spark/pull/17955/files#r117385673 for more details.
+              mapOutputTracker.incrementEpoch()
+
               clearCacheLocs()
 
               if (!shuffleStage.isAvailable) {

Original file line number	Diff line number	Diff line change
`@@ -532,7 +532,7 @@ private[spark] class MapOutputTrackerMaster(`
`532`	`532`	`None`
`533`	`533`	`}`
`534`	`534`
`535`		`- private def incrementEpoch() {`
	`535`	`+ def incrementEpoch() {`
`536`	`536`	`epochLock.synchronized {`
`537`	`537`	`epoch += 1`
`538`	`538`	`logDebug("Increasing epoch to " + epoch)`