-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33669] Wrong error message from YARN application state monitor when sc.stop in yarn client mode #30617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33669] Wrong error message from YARN application state monitor when sc.stop in yarn client mode #30617
Conversation
4e66e18 to
4cdad1b
Compare
|
cc @tgravescs and @mridulm FYI |
|
ok to test |
|
Test build #132372 has finished for PR 30617 at commit
|
|
Kubernetes integration test starting |
mridulm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but will let @tgravescs take a look as well.
...gers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test status failure |
seems not related with this patch |
tgravescs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look fine.
I assume you manually tested?
…in yarn client mode
4cdad1b to
57ea973
Compare
|
Test build #132453 has finished for PR 30617 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
Yes,but not fully。 I test part 2 by raising an InterruptedIOException in client.monitorApplication, but for part 1 I don't find a easy way to mock. But I believe the error trace is enough to show it will raise InterruptedIOException. |
|
Thanks @sqlwindspeaker, merging to master and 3.1 |
… when sc.stop in yarn client mode ### What changes were proposed in this pull request? This change make InterruptedIOException to be treated as InterruptedException when closing YarnClientSchedulerBackend, which doesn't log error with "YARN application has exited unexpectedly xxx" ### Why are the changes needed? For YarnClient mode, when stopping YarnClientSchedulerBackend, it first tries to interrupt Yarn application monitor thread. In MonitorThread.run() it catches InterruptedException to gracefully response to stopping request. But client.monitorApplication method also throws InterruptedIOException when the hadoop rpc call is calling. In this case, MonitorThread will not know it is interrupted, a Yarn App failed is returned with "Failed to contact YARN for application xxxxx; YARN application has exited unexpectedly with state xxxxx" is logged with error level. which confuse user a lot. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? very simple patch, seems no need? Closes #30617 from sqlwindspeaker/yarn-client-interrupt-monitor. Authored-by: suqilong <[email protected]> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit 48f93af) Signed-off-by: Mridul Muralidharan <mridulatgmail.com>
What changes were proposed in this pull request?
This change make InterruptedIOException to be treated as InterruptedException when closing YarnClientSchedulerBackend, which doesn't log error with "YARN application has exited unexpectedly xxx"
Why are the changes needed?
For YarnClient mode, when stopping YarnClientSchedulerBackend, it first tries to interrupt Yarn application monitor thread. In MonitorThread.run() it catches InterruptedException to gracefully response to stopping request.
But client.monitorApplication method also throws InterruptedIOException when the hadoop rpc call is calling. In this case, MonitorThread will not know it is interrupted, a Yarn App failed is returned with "Failed to contact YARN for application xxxxx; YARN application has exited unexpectedly with state xxxxx" is logged with error level. which confuse user a lot.
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
very simple patch, seems no need?