Skip to content

Conversation

@sqlwindspeaker
Copy link

What changes were proposed in this pull request?

This change make InterruptedIOException to be treated as InterruptedException when closing YarnClientSchedulerBackend, which doesn't log error with "YARN application has exited unexpectedly xxx"

Why are the changes needed?

For YarnClient mode, when stopping YarnClientSchedulerBackend, it first tries to interrupt Yarn application monitor thread. In MonitorThread.run() it catches InterruptedException to gracefully response to stopping request.

But client.monitorApplication method also throws InterruptedIOException when the hadoop rpc call is calling. In this case, MonitorThread will not know it is interrupted, a Yarn App failed is returned with "Failed to contact YARN for application xxxxx; YARN application has exited unexpectedly with state xxxxx" is logged with error level. which confuse user a lot.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

very simple patch, seems no need?

@github-actions github-actions bot added the YARN label Dec 5, 2020
@sqlwindspeaker sqlwindspeaker force-pushed the yarn-client-interrupt-monitor branch from 4e66e18 to 4cdad1b Compare December 5, 2020 12:06
@HyukjinKwon
Copy link
Member

cc @tgravescs and @mridulm FYI

@tgravescs
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Dec 7, 2020

Test build #132372 has finished for PR 30617 at commit 4cdad1b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 7, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36973/

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but will let @tgravescs take a look as well.

@SparkQA
Copy link

SparkQA commented Dec 7, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36973/

@sqlwindspeaker
Copy link
Author

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36973/

seems not related with this patch

Copy link
Contributor

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look fine.

I assume you manually tested?

@sqlwindspeaker sqlwindspeaker force-pushed the yarn-client-interrupt-monitor branch from 4cdad1b to 57ea973 Compare December 9, 2020 01:46
@SparkQA
Copy link

SparkQA commented Dec 9, 2020

Test build #132453 has finished for PR 30617 at commit 57ea973.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 9, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37054/

@SparkQA
Copy link

SparkQA commented Dec 9, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37054/

@sqlwindspeaker
Copy link
Author

changes look fine.

I assume you manually tested?

Yes,but not fully。
This issue consist of 2 parts: hadoop ipc will raise InterruptedIOException, and InterruptedIOException should be handled correctly.

I test part 2 by raising an InterruptedIOException in client.monitorApplication, but for part 1 I don't find a easy way to mock.

But I believe the error trace is enough to show it will raise InterruptedIOException.

@asfgit asfgit closed this in 48f93af Dec 9, 2020
@mridulm
Copy link
Contributor

mridulm commented Dec 9, 2020

Thanks @sqlwindspeaker, merging to master and 3.1

asfgit pushed a commit that referenced this pull request Dec 9, 2020
… when sc.stop in yarn client mode

### What changes were proposed in this pull request?
This change make InterruptedIOException to be treated as InterruptedException when closing YarnClientSchedulerBackend, which doesn't log error with "YARN application has exited unexpectedly xxx"

### Why are the changes needed?
For YarnClient mode, when stopping YarnClientSchedulerBackend, it first tries to interrupt Yarn application monitor thread. In MonitorThread.run() it catches InterruptedException to gracefully response to stopping request.

But client.monitorApplication method also throws InterruptedIOException when the hadoop rpc call is calling. In this case, MonitorThread will not know it is interrupted, a Yarn App failed is returned with "Failed to contact YARN for application xxxxx;  YARN application has exited unexpectedly with state xxxxx" is logged with error level. which confuse user a lot.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
very simple patch, seems no need?

Closes #30617 from sqlwindspeaker/yarn-client-interrupt-monitor.

Authored-by: suqilong <[email protected]>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
(cherry picked from commit 48f93af)
Signed-off-by: Mridul Muralidharan <mridulatgmail.com>
@sqlwindspeaker sqlwindspeaker deleted the yarn-client-interrupt-monitor branch December 9, 2020 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants