Enhacement on supervisord log severity on process exit status#5967
Enhacement on supervisord log severity on process exit status#5967leeprecy wants to merge 1 commit intosonic-net:masterfrom
Conversation
|
retest this please |
|
can you upstream this patch? unfortunately, we have just changed to pull the supervisord package from pip. |
|
Hi lguohan I am not clear what I should do with this patch. Thanks |
|
Yes, unfortunately we are no longer building a custom supervisor, as out patches have been upstreamed. You can submit a PR against the upstream supervisor repo (https://github.com/Supervisor/supervisor/), and hopefully they accept it. |
|
please upstream the patch to upstream and we can take it from there. |
|
@lguohan, @jleveque - as suggested, Precy raised a PR to the supervisor project - Supervisor/supervisor#1390. Initial review response is not positive - I don't think they are going to accept the change as is. So, we may need a re-think here. To re-iterate, the purpose of the change in our Broadcom fork was to make the logging level for process exits conditional upon the exit type - we wanted to elevate the log level for certain exit types. We think this is a good change, as end users don't want to wade through pages of uninteresting logs to find the ones they should care about. However it seems that the conditions we are looking for are too specific to our use case for the supervisor guys to accept. So, at this point, I think our choices are: -
I doubt #1 is going to succeed, and I don't think Broadcom will want to do #4 (we think the change is important). So that leaves #2 and #3 - which then boils down to the question of where the patch gets managed - in Community SONiC or in the Broadcom fork? Obviously we'd prefer #2 (SONiC Master), but ultimately the call is yours - if you think it's a useful change then you should take it (and go back to building supervisord from source in SONiC). Thoughts? |
|
i agree with upstream comment that the patch is not intuitive, and hard to maintain. I doubt we can do the 2nd option and maintain this patch as a long term patch in sonic master. In my opinion, it is probably good to take the feedback from the community and design a proper patch for it. once we have that, we can pull in the patch from master directly. one of the reason to pull in pip package is that we would like to track master more closely and receive upstream quicker. |
|
@ben-gale, @jleveque has experience to work supervisor community to upstream the patch. it is not easy, but i think it worth it. Supervisor/supervisor#1047 |
added log severity based on the exit status of a process.
What I did
Supervisord logs all process exit status with INFO log level.
Change to log a process exit status with different severities/
Here is guidline
INFO: for graceful exits
WARN: SIGTERM or SIGKILLfor planned but non-graceful exits
CRIT: for any unplanned and non-graceful exit of processes
How I did it
modified finish() in process.py to check the exit status and provide a proper log level.
How to verify it
Here are some outputs
UT: # docker stop pmon
root@sonic:/home/admin# show logging supervisord | grep 'stopped|exited'
Nov 18 18:55:15.179829 sonic INFO pmon#supervisord 2020-11-18 18:54:38,765 INFO stopped: syseepromd (exit status 0)
Nov 18 18:55:15.179909 sonic INFO pmon#supervisord 2020-11-18 18:54:38,819 INFO stopped: psud (exit status 0)
Nov 18 18:55:15.179990 sonic INFO pmon#supervisord 2020-11-18 18:54:38,926 INFO stopped: xcvrd (exit status 0)
Nov 18 18:55:15.180071 sonic INFO pmon#supervisord 2020-11-18 18:54:40,933 INFO stopped: rsyslogd (exit status 0)
Nov 18 18:55:15.180154 sonic INFO pmon#supervisord 2020-11-18 18:54:40,936 WARN stopped: supervisor-proc-exit-listener (terminated by SIGTERM)
Nov 18 18:55:25.190196 sonic INFO pmon#supervisord 2020-11-18 18:55:20,882 INFO exited: start (exit status 0; expected)
Nov 18 18:55:25.190361 sonic INFO pmon#supervisord 2020-11-18 18:55:24,748 INFO exited: lm-sensors (exit status 0; expected)
Nov 18 18:55:35.200165 sonic INFO pmon#supervisord 2020-11-18 18:55:32,921 INFO exited: dependent-startup (exit status 0; expected)
UT: Process core dumped
Nov 18 22:06:03.628315 sonic INFO bgp#supervisord 2020-11-18 22:06:01,022 INFO exited: bgpmon (terminated by SIGQUIT (core dumped); not expected)
UT: config reload
Nov 18 19:19:04.443390 sonic INFO pmon#supervisord 2020-11-18 19:15:19,959 INFO stopped: syseepromd (exit status 0)
Nov 18 19:19:04.443390 sonic INFO pmon#supervisord 2020-11-18 19:15:20,046 INFO stopped: psud (exit status 0)
Nov 18 19:19:04.443390 sonic INFO pmon#supervisord 2020-11-18 19:15:20,140 INFO stopped: xcvrd (exit status 0)
Nov 18 19:19:04.443390 sonic INFO pmon#supervisord 2020-11-18 19:15:22,146 INFO stopped: rsyslogd (exit status 0)
Nov 18 19:19:04.443444 sonic INFO pmon#supervisord 2020-11-18 19:15:22,149 WARN stopped: supervisor-proc-exit-listener (terminated by SIGTERM)
Nov 18 19:19:14.452742 sonic INFO pmon#supervisord 2020-11-18 19:19:10,247 INFO exited: start (exit status 0; expected)
Nov 18 19:19:17.697709 sonic INFO swss#supervisord 2020-11-18 19:19:15,692 WARN stopped: vxlanmgrd (terminated by SIGTERM)
Nov 18 19:19:17.697709 sonic INFO swss#supervisord 2020-11-18 19:19:15,699 WARN stopped: nbrmgrd (terminated by SIGTERM)
Nov 18 19:19:17.697709 sonic INFO swss#supervisord 2020-11-18 19:19:15,707 WARN stopped: vrfmgrd (terminated by SIGTERM)
Nov 18 19:19:17.697767 sonic INFO swss#supervisord 2020-11-18 19:19:16,712 WARN stopped: buffermgrd (terminated by SIGTERM)
Nov 18 19:19:17.697767 sonic INFO swss#supervisord 2020-11-18 19:19:16,716 WARN stopped: portmgrd (terminated by SIGTERM)
Nov 18 19:19:17.697833 sonic INFO swss#supervisord 2020-11-18 19:19:16,721 WARN stopped: intfmgrd (terminated by SIGTERM)
Nov 18 19:19:17.697833 sonic INFO swss#supervisord 2020-11-18 19:19:16,726 WARN stopped: vlanmgrd (terminated by SIGTERM)
Nov 18 19:19:17.697833 sonic INFO swss#supervisord 2020-11-18 19:19:16,729 WARN stopped: neighsyncd (terminated by SIGTERM)
Nov 18 19:19:19.438289 sonic INFO bgp#supervisord 2020-11-17 22:25:26,836 CRIT exited: bgpcfgd (exit status 1; not expected)
Nov 18 19:19:19.438682 sonic INFO bgp#supervisord 2020-11-17 22:25:27,857 WARN stopped: fpmsyncd (terminated by SIGTERM)
Nov 18 19:19:19.438762 sonic INFO bgp#supervisord 2020-11-17 22:25:27,860 WARN stopped: bgpmon (terminated by SIGTERM)
Nov 18 19:19:19.438851 sonic INFO bgp#supervisord 2020-11-17 22:25:27,866 WARN stopped: bgpd (terminated by SIGKILL)
Nov 18 19:19:19.438982 sonic INFO bgp#supervisord 2020-11-17 22:25:27,894 INFO exited: dependent-startup (exit status 3; expected)
Nov 18 19:19:19.439077 sonic INFO bgp#supervisord 2020-11-17 22:25:27,898 INFO stopped: zebra (exit status 0)
Nov 18 19:19:19.439157 sonic INFO bgp#supervisord 2020-11-17 22:25:27,902 INFO stopped: rsyslogd (exit status 0)
Nov 18 19:19:19.439283 sonic INFO bgp#supervisord 2020-11-17 22:25:27,905 WARN stopped: supervisor-proc-exit-listener (terminated by SIGTERM)
Nov 18 19:19:24.460369 sonic INFO pmon#supervisord 2020-11-18 19:19:14,590 INFO exited: lm-sensors (exit status 0; expected)
Nov 18 19:19:24.461032 sonic INFO pmon#supervisord 2020-11-18 19:19:21,986 INFO exited: dependent-startup (exit status 0; expected)
Nov 18 19:19:25.470111 sonic INFO syncd#supervisord: syncd [5] child /usr/bin/syncd exited status: 0
Nov 18 19:19:36.931198 sonic INFO swss#supervisord 2020-11-18 19:19:18,740 WARN stopped: orchagent (terminated by SIGTERM)
Nov 18 19:19:36.931276 sonic INFO swss#supervisord 2020-11-18 19:19:19,745 WARN stopped: portsyncd (terminated by SIGTERM)
Nov 18 19:19:36.931322 sonic INFO swss#supervisord 2020-11-18 19:19:19,749 INFO stopped: rsyslogd (exit status 0)
Nov 18 19:19:36.931378 sonic INFO swss#supervisord 2020-11-18 19:19:19,752 WARN stopped: supervisor-proc-exit-listener (terminated by SIGTERM)
Nov 18 19:19:38.051206 sonic INFO teamd#supervisord 2020-11-18 19:17:39,799 INFO stopped: teamsyncd (exit status 0)
Nov 18 19:19:38.051206 sonic INFO teamd#supervisord 2020-11-18 19:17:41,809 INFO stopped: tlm_teamd (exit status 0)
Nov 18 19:19:38.051206 sonic INFO teamd#supervisord 2020-11-18 19:17:42,812 CRIT stopped: teammgrd (exit status 255)
Nov 18 19:19:38.051206 sonic INFO teamd#supervisord 2020-11-18 19:17:43,818 INFO stopped: rsyslogd (exit status 0)
Nov 18 19:19:38.051262 sonic INFO teamd#supervisord 2020-11-18 19:17:43,820 WARN stopped: supervisor-proc-exit-listener (terminated by SIGTERM)
Nov 18 19:19:38.051390 sonic INFO teamd#supervisord 2020-11-18 19:19:36,635 INFO exited: start (exit status 0; expected)