Conversation
|
This pull request introduces 15 alerts when merging 590ee50a9547bf9d9fd88b478aeefdca37e0f571 into b152f2a - view on LGTM.com new alerts:
|
|
Could you please add comments for each new function/class? |
|
Could you please fix the LGTM issues? |
|
Could please re-format files? I see many format issues, for example: should be |
|
Could you please remove unused comments? |
There was a problem hiding this comment.
could you add comment to explain the purpose here?
There was a problem hiding this comment.
Please use ConfigDBConnector in swsscommon.
There was a problem hiding this comment.
Why not use sonic_py_common.logger.Logger?
There was a problem hiding this comment.
Is there a special reason to use this class instead of directly using dict?
There was a problem hiding this comment.
When will the file being closed?
There was a problem hiding this comment.
Removed lock as it is not need here.
There was a problem hiding this comment.
what is the purpose of this sleep?
There was a problem hiding this comment.
maybe you need stop the event loop?
There was a problem hiding this comment.
when the event loop runs, as ProcessTaskBase in src/sonic-py-common/sonic_py_common/task_base.py has task_stop which actually kills the subprocess if not joined(exited) on time.
def task_stop(self):
# Signal the process to stop
self.task_stopping_event.set()
# Wait for the process to exit
self._task_process.join(self._stop_timeout_secs)
# If the process didn't exit, attempt to kill it
if self._task_process.is_alive():
os.kill(self._task_process.pid, signal.SIGKILL)
if self._task_process.is_alive():
return False
There was a problem hiding this comment.
what is the lock for?
There was a problem hiding this comment.
Removed lock as it is not needed here.
There was a problem hiding this comment.
Please use constant value instead of magic number
src/system-health/scripts/healthd
Outdated
There was a problem hiding this comment.
sysmon.system_service is a infinite loop, the rest of the code in this function will never run.
|
could you please add unit test? |
|
This pull request introduces 4 alerts when merging 9388306a2644f61c903aa5b918f7ba5564af4af0 into 5cd6bc4 - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Could you elaborate the reason to keep it as a list? Query/remove an element from set is faster than list.
There was a problem hiding this comment.
Not sure I understand.
There was a problem hiding this comment.
spl_srv_list not used.
There was a problem hiding this comment.
Maybe use output.splitlines()
There was a problem hiding this comment.
Also, used glob in place of listdir and modified dnsrvs_name from list to set as per suggestion.
There was a problem hiding this comment.
could you consider using hmset?
|
This pull request introduces 1 alert when merging ee0818c6f8ddc19ae926e7a128016717bd515f84 into 3fc3259 - view on LGTM.com new alerts:
|
e6f68dd to
001b7ed
Compare
|
This pull request introduces 1 alert when merging 001b7ed92156e6ff9099f8a254a0c711ddee122c into e0f5333 - view on LGTM.com new alerts:
|
95e5686 to
9782dbe
Compare
Taken care of all the format issues |
Added comments for all functions and classes |
Have added unit test coverage for the system ready code in test_system_health.py |
|
/azpw run Azure.sonic-buildimage |
1 similar comment
|
/azpw run Azure.sonic-buildimage |
|
/AzurePipelines run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/AzurePipelines run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run Azure.sonic-buildimage |
|
/AzurePipelines run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run Azure.sonic-buildimage |
|
/AzurePipelines run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/AzurePipelines run Azure.sonic-buildimage |
|
Commenter does not have sufficient privileges for PR 10479 in repo Azure/sonic-buildimage |
|
/AzurePipelines run Azure.sonic-buildimage |
|
Commenter does not have sufficient privileges for PR 10479 in repo Azure/sonic-buildimage |
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| description "This configuration controls the system ready tool to check | ||
| the app ready/up status"; | ||
| type boolean; | ||
| default false; |
There was a problem hiding this comment.
Could you add yang model unit test?
We recently fixed a bug #10986
…anch Related work items: #52, #71, #73, #75, #77, sonic-net#1306, sonic-net#1588, sonic-net#1991, sonic-net#2031, sonic-net#2040, sonic-net#2053, sonic-net#2066, sonic-net#2069, sonic-net#2087, sonic-net#2107, sonic-net#2110, sonic-net#2112, sonic-net#2113, sonic-net#2117, sonic-net#2124, sonic-net#2125, sonic-net#2126, sonic-net#2128, sonic-net#2130, sonic-net#2131, sonic-net#2132, sonic-net#2133, sonic-net#2134, sonic-net#2135, sonic-net#2136, sonic-net#2137, sonic-net#2138, sonic-net#2139, sonic-net#2140, sonic-net#2143, sonic-net#2158, sonic-net#2161, sonic-net#2233, sonic-net#2243, sonic-net#2250, sonic-net#2254, sonic-net#2260, sonic-net#2261, sonic-net#2267, sonic-net#2278, sonic-net#2282, sonic-net#2285, sonic-net#2288, sonic-net#2289, sonic-net#2292, sonic-net#2294, sonic-net#8887, sonic-net#9279, sonic-net#9390, sonic-net#9511, sonic-net#9700, sonic-net#10025, sonic-net#10322, sonic-net#10479, sonic-net#10484, sonic-net#10493, sonic-net#10500, sonic-net#10580, sonic-net#10595, sonic-net#10628, sonic-net#10634, sonic-net#10635, sonic-net#10644, sonic-net#10670, sonic-net#10691, sonic-net#10716, sonic-net#10731, sonic-net#10750, sonic-net#10751, sonic-net#10752, sonic-net#10761, sonic-net#10769, sonic-net#10775, sonic-net#10776, sonic-net#10779, sonic-net#10786, sonic-net#10792, sonic-net#10793, sonic-net#10800, sonic-net#10806, sonic-net#10826, sonic-net#10839, sonic-net#10840, sonic-net#10842, sonic-net#10844, sonic-net#10847, sonic-net#10849, sonic-net#10852, sonic-net#10865, sonic-net#10872, sonic-net#10877, sonic-net#10886, sonic-net#10889, sonic-net#10903, sonic-net#10904, sonic-net#10905, sonic-net#10913, sonic-net#10914, sonic-net#10916, sonic-net#10919, sonic-net#10925, sonic-net#10926, sonic-net#10929, sonic-net#10933, sonic-net#10934, sonic-net#10937, sonic-net#10941, sonic-net#10947, sonic-net#10952, sonic-net#10953, sonic-net#10957, sonic-net#10959, sonic-net#10971, sonic-net#10972, sonic-net#10980
|
Hi @sg893052 , the PR introduces some sub-processes to system-health daemon. We observed that each of the daemon uses many memory than it needs: Do you remember why did you decide to use multi-processing instead of multi-threading when you implement this feature? |
|
@sg893052 could you check above issue? |
|
I think, I chose multiprocessing over multithreading for this implementation because, during initial testing with threads, I observed that one of the monitoring threads (particularly the D-Bus listener) could occasionally become unresponsive — either due to blocking I/O, event loop stalls, or external system pressure. When this happened, it not only failed to push events into the queue, but in some cases also interfered with timely queue consumption, leading to missed or delayed system state updates. |
Why I did it
At present, there is no mechanism in an event driven model to know that the system is up with all the essential sonic services and also, all the docker apps are ready along with port ready status to start the network traffic. With the asynchronous architecture of SONiC, we will not be able to verify if the config has been applied all the way down to the HW. But we can get the closest up status of each app and arrive at the system readiness.
How I did it
A new python based system monitor tool is introduced under system-health framework to monitor all the essential system host services including docker wrapper services on an event based model and declare the system is ready. This framework gives provision for docker apps to notify its closest up status. CLIs are provided to fetch the current system status and also service running status and its app ready status along with failure reason if any.
How to verify it
"show system-health sysready-status" click CLI
Syslogs for system ready
Which release branch to backport (provide reason below if selected)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)