-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-24825 Add UncaughtExceptionHandler for NettyRpcConnection Relogin thread #2206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
wchevreuil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Would other parts of the connection attempt catching only IOException also benefit from this change?
| try { | ||
| provider.relogin(); | ||
| } catch (IOException e) { | ||
| } catch (Throwable e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it makes sense. IIRC, netty swallows the throwable silently, so it's difficult to figure out what's wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think other parts catching only IOException is just fine, the uncatched exception will throw upward
|
Mind exlaining the reason a bit? |
|
|
I think this is a Xiaomi specific problem? Seems you have compiled with a customized hadoop library but then someone introduces a official library? |
Yes. But we still have chance to meet other incompatibility problem or other RuntimeException? And when meet this problem, it will be hard to find the root cause. |
|
Add an uncaught exception handler to the Thread created by the thread factory? So we could log an error for this 'unexpected' exception. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
@virajjasani when submitting this patch, I find maybe the default UncaughtExceptionHandler (Threads.LOGGING_EXCEPTION_HANDLER) will unset in ThreadFactoryBuilder.dobuild after HBASE-24750, It will be an Inconsistent ? |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
We are not planning to remove |
|
Do you think using default UncaughtExceptionHandler will provide better warn logs if something goes wrong with relogin()? Maybe try creating a custom UncaughtExceptionHandler with |
I think LOGGING_EXCEPTION_HANDLER is fine, It will log Thread name |
virajjasani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
#2231 has resolved this issue, close this PR |
We encountered a problem. The client access to the server kept reporting "Can not send request because relogin is in progress.", and there was no obvious exception log. Finally, it was found that an exception that could not be caught occurred during the execution of relogin(), causing the thread to exit silently, After this patch can get the exception below
2020-08-06 14:59:51.664 WARN org.apache.hadoop.hbase.ipc.NettyRpcConnection - relogin failed java.lang.NoSuchMethodError: org.apache.hadoop.security.UserGroupInformation.isLoginKerberosKeyBased()Z at org.apache.hadoop.hbase.ipc.RpcConnection.relogin(RpcConnection.java:180) ~[zjyprc-hadoop-flink1.9-xmpush-log-to-trace-span-xmpush-flink-talos-task-1.3-SNAPSHOT.jar:?] at org.apache.hadoop.hbase.ipc.NettyRpcConnection$1.run(NettyRpcConnection.java:162) ~[zjyprc-hadoop-flink1.9-xmpush-log-to-trace-span-xmpush-flink-talos-task-1.3-SNAPSHOT.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_202] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_202] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_202] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_202] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_202] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_202] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]