[dhcp_relay] add retry to checking process dhcprelayd#25470
[dhcp_relay] add retry to checking process dhcprelayd#25470yxieca merged 1 commit intosonic-net:masterfrom
Conversation
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Xichen Lin <lukelin0907@gmail.com>
89abea4 to
e8961b2
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR updates dhcprelayd’s process-checking logic to reduce failures caused by transient psutil exceptions when reading process metadata, aiming to avoid unnecessary dhcp_relay container restarts.
Changes:
- Add a retry loop around
psutilprocess inspection (proc.name()/ppid()/cmdline()) in_check_dhcp_relay_processes. - Re-raise the last encountered exception after retrying up to 5 times.
|
@yxieca Hi Ying, please help review and merge this pr |
|
AI agent on behalf of Ying: Reviewed the change in dhcprelayd.py. The retry loop looks reasonable, but consider adding a small backoff sleep between retries to avoid tight looping when psutil hits transient errors. Also consider narrowing the retried exception types (for example AccessDenied vs unexpected exceptions) so we do not mask real bugs. Otherwise looks fine. |
|
What is the motivation for this PR? psutil occasionally throws errors when reading process names; retrying fixes the intermittent failure. How did you do it? Added a retry around the dhcprelayd process name check. How did you verify/test it? Applied on a production machine and confirmed the issue was fixed. Signed-off-by: Xichen Lin <lukelin0907@gmail.com> Signed-off-by: arlakshm <arlakshm@microsoft.com>
|
Cherry-pick PR to 202511: #26768 |
Why I did it
Occasionally, the psutil lib will through error when trying to read name. Retrying fix the problem.
Work item tracking
How I did it
Retry the operation
How to verify it
Applied it on production machine and fixed the issue
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)