Fix hostcfgd does not react to SIGINT issue.#10977
Fix hostcfgd does not react to SIGINT issue.#10977liuh-80 wants to merge 2 commits intosonic-net:masterfrom
Conversation
|
@stepanblyschak, could you please review this PR? |
|
|
||
| # Initialize swsscommon signal helper, because python signal handler will be blocked by long running c++ code. | ||
| # For more information please check: https://docs.python.org/3/library/signal.html | ||
| SignalHandlerHelper.registerSignalHandler(SIGNAL_INT) |
There was a problem hiding this comment.
Since this overrides signal.signal(signal.SIGINT, signal_handler), we should probably remove signal.SIGINT processing from hostcfgd.
There was a problem hiding this comment.
Also, on reboot, by default systemd sends SIGTERM but not SIGINT. So, this will not handle delay on reboot. In addition to this, we should probably extend hostcfgd.service with KillSignal=SIGINT or ExecStop= that pkill hostcfgd with SIGTERM and then with SIGINT.
There was a problem hiding this comment.
either
[Service]
Type=simple
ExecStart=/usr/local/bin/hostcfgd
KillSignal=SIGINT
or
[Service]
Type=simple
ExecStart=/usr/local/bin/hostcfgd
ExecStop=/bin/bash -c '/usr/bin/killall -15 hostcfgd && /usr/bin/killall -2 hostcfgd'
There was a problem hiding this comment.
Thanks, currently swsscommon only handle SIGINT, so will add SIGTERM support with this PR: sonic-net/sonic-swss-common#626
I will update this PR after swsscommon updated.
|
@liuh-80 I agree with @akokhan, we need to handle SIGTERM and remove signal handlers defined in python since they are overwritten by swss-common. Although, I have some concerns on usability of such approach. IMO we have lots of python daemons that use ConfigDBConnector.listen() and usually these daemons use custom signal handlers in order to perform cleanup. We then need to review this python code and rewrite their cleanup procedures. I prepared the following change - sonic-net/sonic-swss-common#624 which is based on your original PR but does not use signal helpers. The idea is that we can pass interrupt_on_signal=True to select() which will then return immidiatelly once errno == EINTR. The ConfigDBConnector.listen() method will pass interrupt_on_signal=True to PubSub.listen_messages() which will pass it to select(). This means when signal occurs we will exit the C++ .so code and enter python world where python can run his signal handlers and then (if signal handler didn't exit) repeat PubSub.listen_messages() in the loop, so that we don't exit the ConfigDBConnector.listen() on signals that we don't want to exit on. With such solution we don't need to modify hostcfgd and potentially many other daemons |
@stepanblyschak , according to your PR any signal will break ConfigDBConnector.listen() method. that maybe not a expected behavior. |
|
@liuh-80 The idea is that PubSub.listen_message() will exit on signal, however ConfigDBConnect.liste() will call PubSub.listen_message() in a while loop, so listen() does not exit. In my testing I was sending SIGHUP to hostcfgd, I see that hostcfgd's signal handler prints a message about received signal but the daemon still runs. When I send SIGTERM I see hostcfgd's message in the syslog about received signal and hostcfgd exits. |
I'm working on a change in sonic-swss-common to support register python signal handler to swsscommon, and will update this PR after that:
|
|
Close this PR because Stepan has a better solution without change any code here: https://github.com/Azure/sonic-swss-common/pull/624/files |
Why I did it
How I did it
How to verify it
Which release branch to backport (provide reason below if selected)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)