-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-6050] [yarn] Relax matching of vcore count in received containers. #4818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like the config is named backwards? If I want to match any then I want to use allocatedContainer.getResource.
Perhaps matchExactResource or keep the name and switch what you get. I would expect to match any by default so its backwards compatible for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also as @mridulm mentioned we are basically just matching what we got back which disables all checks. Before we were checking that memory was atleast big enough.
private def isResourceConstraintSatisfied(container: Container): Boolean = {
container.getResource.getMemory >= (executorMemory + memoryOverhead)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, no, if you want to match any you want to use the
resourcefield, not the value returned by yarn (allocatedContainer.getResource). That means the comparison will effectively beresource == resourcewhich will always be true.I can change the config name if you think that would be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: memory, can yarn really allocate a container with less resources than you asked for?
The
resourcein this class is pretty much static during the Spark app's lifetime.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm still not seeing it. resource is:
Resource.newInstance(executorMemory + memoryOverhead, executorCores)
which is going to be executorCores passed in, which would be for instance 8 if I request 8.
That resource is then passed to amClient.getMatchingRequests which is going to find requests that have 8 vcores, which isn't what we want because the RM without cpu scheduling returns ones with 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the latest version to see if it's any clearer.
But the gist is that
amClient.getMatchingResources()is matching the resources you asked for (which isresource) against the parameter you're passing. What the option controls is whether you're passingresourcealso as the resource to match against - so basically, the exact same structure that is already in the outstanding list of requests.So I think you're reading the condition backwards. Or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry I see now. I was reading it backwards and thinking it was matching what was actually allocated.
So I guess the question is whether we want the default of this to be true so that its backwards compatible. Otherwise the behavior changes for anyone running now that upgrades.