Skip to content

Conversation

@sryza
Copy link
Contributor

@sryza sryza commented Dec 28, 2014

...tasks

@sryza sryza changed the title PARK-4921. TaskSetManager.dequeueTask returns PROCESS_LOCAL for NO_PREF ... SPARK-4921. TaskSetManager.dequeueTask returns PROCESS_LOCAL for NO_PREF ... Dec 28, 2014
@SparkQA
Copy link

SparkQA commented Dec 28, 2014

Test build #24848 has finished for PR 3816 at commit 247ce55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sryza
Copy link
Contributor Author

sryza commented Dec 29, 2014

This seems like a fairly simple fix, but given that I don't 100% understand the discussion on SPARK-2294 / #1313, it might be good for @CodingCat, @kayousterhout @mridulm, or @mateiz to take a look.

@mateiz
Copy link
Contributor

mateiz commented Dec 29, 2014

Why was this a problem? You need to make sure that this won't change the locality level the scheduler launches tasks at due to delay scheduling. For example, if a stage contained both process-local and no-pref tasks, and it was still able to launch tasks locally (without the delay expiring), this change might make it forget that and not wait long enough, thus not getting local tasks. Please write down something explaining why this was a problem and why the fix won't break other things.

@sryza
Copy link
Contributor Author

sryza commented Dec 29, 2014

@mateiz the JIRA claims that this results in extra unnecessary locality delay. I thought that the problem might have been an obvious typo, but it sounds like you're saying this may have been the intended behavior. I'll look deeper into it.

@mateiz
Copy link
Contributor

mateiz commented Dec 29, 2014

Well, what I'm saying is to look at how it affects the rest of the scheduler. That was set to PROCESS_LOCAL there for a reason, it wasn't a typo. It was to make sure that launching a no-pref task doesn't then cause you to increase your allowed locality level and miss waiting for other local ones. I'd also like to see what performance different this makes in the original case, and why it was a problem there (e.g. was this an InputFormat with no locality info at all or something).

One fix by the way may be to not count NO_PREF launches at all when deciding how to update delay scheduling variables, but even then it's good to understand what this was doing and make sure it won't break it.

@CodingCat
Copy link
Contributor

yes, @mateiz was right, I would like to give more clues to facilitate your debugging

  1. NO_PREF will not be adjusted by getAllowedLocalityLevel() method of TaskSetManager(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L431). I made it to ensure that NO_PREF tasks can be scheduled ASAP instead of waiting for NODE_LOCAL
  2. based on 1, when the resourceOffers() of TaskScheduleImpl (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L217) is called for another time before the more local level expires and ** after a NO_PREF task is scheduled **, we may see that, the locality level was bumped up to RACK_LOCAL, instead of going through PROCESS_LOCAL, NODE_LOCAL, etc.
  3. in the JIRA discussion, Rui Li's understanding on that the if check is correct, because we need to return PROCESS_LOCAL and also don't want to reset currentLocalityIndex (so I didn't get the reason of the performance degrading you mentioned)
  4. the logic of returning PROCESS_LOCAL for NO_PREF tasks actually has exited for a long while, long before my patch on TaskSetManager.scala...I just followed this idea and my own understanding on that

@sryza
Copy link
Contributor Author

sryza commented Dec 30, 2014

@mateiz @CodingCat thanks for the additional clarification.

The NO_PREF tasks are coming from Hive's CombineHiveInputFormat, which doesn't include a locality preference when many files are packed into the same split.

Looking at the code a little more, it appears that the locality returned by dequeueTask after scheduling a NO_PREF task shouldn't matter at all. NO_PREF tasks will only end up being scheduled when resourceOffer is called with maxLocality=NO_PREF (right?). Other than for logging, this is the only place the returned taskLocality is used:

      if (maxLocality != TaskLocality.NO_PREF) {
        currentLocalityIndex = getLocalityIndex(taskLocality)
        lastLaunchTime = curTime
      }

So when scheduling a NO_PREF task, taskLocality isn't used at all. IIUC, returning NO_PREF instead of PROCESS_LOCAL would just make the code a little clearer and make a log message more accurate. But wouldn't fix the performance issue mentioned in the JIRA.

@CodingCat
Copy link
Contributor

Hi, @sryza,

yes, NO_PREF tasks will be scheduled only when the maxLocality is NO_PREF.

Now, I think it's safe to apply this change, since we have that if check there...(sorry about that, I even forgot what I wrote...)

@mateiz
Copy link
Contributor

mateiz commented Jan 5, 2015

@sryza any progress investigating this? FYI those NO_PREF tasks can be returned when the locality level is "any" and "rack_local" too. See TaskLocality.scala. If you don't show a concrete benefit from this, I'd suggest closing the issue and not changing it. The way I see it, this could break the scheduling order in existing jobs, with minimal benefit in performance (just avoiding one small delay for datasets that include both combined and non-combined files).

@sryza
Copy link
Contributor Author

sryza commented Jan 5, 2015

My conclusion (sorry if it was unclear above) was that dequeueTask returning NO_PREF instead of PROCESS_LOCAL should have no effect at all. I still think it's worth changing for clarity, but it's obviously less important.

In what cases would dequeueTask return NO_PREF tasks when maxLocality=RACK_LOCAL or ANY? My understanding is that, in a single resourceOffers call, dequeueTask gets called multiple times in order of TaskLocality, so any NO_PREF tasks would be returned when it's called with maxLocality=NO_PREF. And none would remain when ANY and RACK_LOCAL come around.

@sryza
Copy link
Contributor Author

sryza commented Mar 10, 2015

I'm going to close this as "Won't Fix" as this has been open for a long time, touches sensitive code, and is mainly a code cleanup issue.

@zsxwing
Copy link
Member

zsxwing commented Oct 17, 2015

@sryza, could you reopen this one?

TaskInfo will be sent to SparkListener. Without this patch, for NO_PREF RDDs, their taskLocalitys in TaskInfo will be PROCESS_LOCAL, and the user can see it in SparkListener. Test codes:

    import org.apache.spark.scheduler.{SparkListenerTaskStart, SparkListener}
    sc.addSparkListener(new SparkListener{
      override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = {
        println("taskLocality: " + taskStart.taskInfo.taskLocality)
      }
    })
    sc.parallelize(1 to 10, 1).count()

@sryza
Copy link
Contributor Author

sryza commented Nov 2, 2015

@zsxwing I have sadly lost most of the context on this issue and don't have time to pick it back up at the moment.

What you point out does seem to be an issue, though. Definitely feel free to take my patch if you think it's the right fix.

@sryza sryza reopened this Nov 2, 2015
@zsxwing
Copy link
Member

zsxwing commented Nov 3, 2015

retest this please

@zsxwing
Copy link
Member

zsxwing commented Nov 3, 2015

I think this is the right fix. However, more pairs of eyes on this change would be better.

@SparkQA
Copy link

SparkQA commented Nov 4, 2015

Test build #44967 has finished for PR 3816 at commit 247ce55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Dec 31, 2015

I'm going to close this pull request. If this is still relevant and you are interested in pushing it forward, please open a new pull request. Thanks!

@asfgit asfgit closed this in 7b4452b Dec 31, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants