[generic_config_updater] Add diagnostic logging for vlanintf_validato…#4105
Merged
qiluo-msft merged 4 commits intoNov 17, 2025
Merged
Conversation
…r failures Add enhanced error logging to diagnose vlanintf_validator failures when flushing neighbor entries for deleted VLAN interface IPs. These minimal logging additions only activate on errors, providing zero overhead during normal operation while enabling root cause analysis when the validator fails.
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
volodymyrsamotiy
approved these changes
Nov 3, 2025
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
4ee08fe to
44b049d
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
qiluo-msft
approved these changes
Nov 17, 2025
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR adds diagnostic logging to vlanintf_validator to help diagnose failures when flushing neighbor entries for deleted VLAN interface IPs. The logging only activates on errors, maintaining zero overhead during normal operations.
Key Changes
- Enhanced
command_wrapper()to capture stdout/stderr and log detailed error information when commands fail - Added error logging in
vlanintf_validator()to identify which interface and IP address failed - Updated test infrastructure to support stdout/stderr in mocked subprocess calls and added failure test cases
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| generic_config_updater/services_validator.py | Added diagnostic error logging to command_wrapper and vlanintf_validator functions |
| tests/generic_config_updater/service_validator_test.py | Updated mock infrastructure and added test cases for vlanintf_validator failure scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
gordon-nexthop
pushed a commit
to nexthop-ai/sonic-utilities
that referenced
this pull request
Nov 25, 2025
…r failures (sonic-net#4105) Add enhanced error logging to diagnose vlanintf_validator failures when flushing neighbor entries for deleted VLAN interface IPs. These minimal logging additions only activate on errors, providing zero overhead during normal operation while enabling root cause analysis when the validator fails. This could help in the investigatation of sonic-net/sonic-buildimage#23680 What I did Added diagnostic logging to vlanintf_validator to identify the root cause when neighbor flush operations fail during VLAN interface IP removal. Previously, the validator would fail silently with only a generic error message, making it impossible to diagnose the actual failure reason. How I did it Enhanced command_wrapper() function to: Capture stdout and stderr from subprocess calls by setting capture_output=True Log the exact command, return code, stdout, and stderr when commands fail Only log when errors occur (returncode != 0), maintaining zero overhead during normal operation Added error logging in vlanintf_validator() to: Log which specific interface and IP address failed during neighbor flush Include the return code for correlation with command_wrapper logs Updated unit tests in service_validator_test.py to reflect the original command format How to verify it Normal operation (no errors): Apply a VLAN interface configuration change Verify no additional log messages appear when operation succeeds Error scenario: Trigger a vlanintf_validator failure (e.g., by manipulating interface state during config update) Verify detailed error logs appear showing: The exact command that failed The return code stdout/stderr output from the command Previous command output (if the output of a command-line utility has changed) 2025 Oct 25 18:04:21.266397 r-sn4700-72 ERR GenericConfigUpdater: Change Applier: service invoked: generic_config_updater.services_validator.vlanintf_validator failed with ret=False New command output (if the output of a command-line utility has changed) 2025 Oct 25 18:04:21.266397 r-sn4700-72 ERR GenericConfigUpdater: Service Validator: Command failed: 'ip neigh flush dev Vlan1000 192.168.0.1/21', returncode: 2 2025 Oct 25 18:04:21.266398 r-sn4700-72 ERR GenericConfigUpdater: Service Validator: stderr: Cannot find device "Vlan1000" 2025 Oct 25 18:04:21.266399 r-sn4700-72 ERR GenericConfigUpdater: Service Validator: vlanintf_validator: Failed to flush neighbors for Vlan1000 192.168.0.1/21, returncode=2 2025 Oct 25 18:04:21.266400 r-sn4700-72 ERR GenericConfigUpdater: Change Applier: service invoked: generic_config_updater.services_validator.vlanintf_validator failed with ret=False
saiarcot895
added a commit
to saiarcot895/sonic-mgmt
that referenced
this pull request
Dec 1, 2025
sonic-net/sonic-utilities#4105 added error logging for when a GCU command exits with a non-zero exit code. There is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs then need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
12 tasks
yxieca
pushed a commit
to sonic-net/sonic-mgmt
that referenced
this pull request
Dec 5, 2025
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
opcoder0
pushed a commit
to opcoder0/sonic-mgmt
that referenced
this pull request
Dec 8, 2025
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
nissampa
pushed a commit
to nissampa/sonic-mgmt_dpu_test
that referenced
this pull request
Dec 9, 2025
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Nishanth Sampath Kumar <nissampa@cisco.com>
selldinesh
pushed a commit
to selldinesh/sonic-mgmt
that referenced
this pull request
Dec 11, 2025
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: selldinesh <dinesh.sellappan@keysight.com>
saravanan-nexthop
pushed a commit
to saravanan-nexthop/sonic-mgmt
that referenced
this pull request
Dec 15, 2025
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Saravanan <saravanan@nexthop.ai>
saiarcot895
added a commit
to saiarcot895/sonic-mgmt
that referenced
this pull request
Dec 19, 2025
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
12 tasks
gshemesh2
pushed a commit
to gshemesh2/sonic-mgmt
that referenced
this pull request
Dec 21, 2025
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
gshemesh2
pushed a commit
to gshemesh2/sonic-mgmt
that referenced
this pull request
Dec 21, 2025
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
vmittal-msft
pushed a commit
to sonic-net/sonic-mgmt
that referenced
this pull request
Jan 7, 2026
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
YairRaviv
pushed a commit
to YairRaviv/sonic-utilities
that referenced
this pull request
Jan 12, 2026
…r failures (sonic-net#4105) Add enhanced error logging to diagnose vlanintf_validator failures when flushing neighbor entries for deleted VLAN interface IPs. These minimal logging additions only activate on errors, providing zero overhead during normal operation while enabling root cause analysis when the validator fails. This could help in the investigatation of sonic-net/sonic-buildimage#23680 What I did Added diagnostic logging to vlanintf_validator to identify the root cause when neighbor flush operations fail during VLAN interface IP removal. Previously, the validator would fail silently with only a generic error message, making it impossible to diagnose the actual failure reason. How I did it Enhanced command_wrapper() function to: Capture stdout and stderr from subprocess calls by setting capture_output=True Log the exact command, return code, stdout, and stderr when commands fail Only log when errors occur (returncode != 0), maintaining zero overhead during normal operation Added error logging in vlanintf_validator() to: Log which specific interface and IP address failed during neighbor flush Include the return code for correlation with command_wrapper logs Updated unit tests in service_validator_test.py to reflect the original command format How to verify it Normal operation (no errors): Apply a VLAN interface configuration change Verify no additional log messages appear when operation succeeds Error scenario: Trigger a vlanintf_validator failure (e.g., by manipulating interface state during config update) Verify detailed error logs appear showing: The exact command that failed The return code stdout/stderr output from the command Previous command output (if the output of a command-line utility has changed) 2025 Oct 25 18:04:21.266397 r-sn4700-72 ERR GenericConfigUpdater: Change Applier: service invoked: generic_config_updater.services_validator.vlanintf_validator failed with ret=False New command output (if the output of a command-line utility has changed) 2025 Oct 25 18:04:21.266397 r-sn4700-72 ERR GenericConfigUpdater: Service Validator: Command failed: 'ip neigh flush dev Vlan1000 192.168.0.1/21', returncode: 2 2025 Oct 25 18:04:21.266398 r-sn4700-72 ERR GenericConfigUpdater: Service Validator: stderr: Cannot find device "Vlan1000" 2025 Oct 25 18:04:21.266399 r-sn4700-72 ERR GenericConfigUpdater: Service Validator: vlanintf_validator: Failed to flush neighbors for Vlan1000 192.168.0.1/21, returncode=2 2025 Oct 25 18:04:21.266400 r-sn4700-72 ERR GenericConfigUpdater: Change Applier: service invoked: generic_config_updater.services_validator.vlanintf_validator failed with ret=False
venu-nexthop
pushed a commit
to venu-nexthop/sonic-mgmt
that referenced
this pull request
Jan 13, 2026
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
thisptr-sh
pushed a commit
to thisptr-sh/sonic-mgmt
that referenced
this pull request
Jan 21, 2026
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
ytzur1
pushed a commit
to ytzur1/sonic-mgmt
that referenced
this pull request
Jan 29, 2026
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
ytzur1
pushed a commit
to ytzur1/sonic-mgmt
that referenced
this pull request
Feb 2, 2026
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Yael Tzur <ytzur@nvidia.com>
anilal-amd
pushed a commit
to anilal-amd/anilal-forked-sonic-mgmt
that referenced
this pull request
Feb 19, 2026
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Zhuohui Tan <zhuohui.tan@amd.com>
selldinesh
pushed a commit
to selldinesh/sonic-mgmt
that referenced
this pull request
Apr 1, 2026
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: selldinesh <dinesh.sellappan@keysight.com>
rraghav-cisco
pushed a commit
to rraghav-cisco/sonic-mgmt
that referenced
this pull request
Apr 20, 2026
What is the motivation for this PR? Update the loganalyzer ignore list to fix a regex issue, and update the GCU test to ignore new error syslogs. The regex for ignoring systemd-networkd.socket failing to start included [1], which has a special meaning when interpreted as regex (match one character that has 1 in it; in other words, match a 1). Escape the brackets so that they get treated literally. sonic-net/sonic-utilities#4105 started logging errors when a GCU command fails with a non-zero exit status. However, there is one case of a (known) failure where dhcp_relay fails to restart because it hit the start limit. These error logs need to be ignored. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Raghavendran Ramanathan <rraghav@cisco.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…r failures
Add enhanced error logging to diagnose vlanintf_validator failures when flushing neighbor entries for deleted VLAN interface IPs.
These minimal logging additions only activate on errors, providing zero overhead during normal operation while enabling root cause analysis when the validator fails.
This could help in the investigatation of sonic-net/sonic-buildimage#23680
What I did
Added diagnostic logging to
vlanintf_validatorto identify the root cause when neighbor flush operations fail during VLAN interface IP removal. Previously, the validator would fail silently with only a generic error message, making it impossible to diagnose the actual failure reason.How I did it
Enhanced
command_wrapper()function to:capture_output=TrueAdded error logging in
vlanintf_validator()to:Updated unit tests in
service_validator_test.pyto reflect the original command formatHow to verify it
Normal operation (no errors):
Error scenario:
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)