Replace per port SFP presence check with global command in check_interface_status#21074
Replace per port SFP presence check with global command in check_interface_status#21074mihirpat1 wants to merge 2 commits intosonic-net:masterfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes the SFP transceiver presence checking in test_restart_swss to fix intermittent test failures on HWSKUs with many ports. The previous implementation checked each port individually, causing cumulative delays that exceeded the 300-second timeout on systems with 512+ logical ports.
Key changes:
- Replaced per-port
show interface transceiver presence <port>commands with a single global command - Eliminated the ~1 second delay per port that was causing timeout issues on large port configurations
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
c501111 to
4793b75
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
4793b75 to
8b1ab35
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Commenter does not have sufficient privileges for PR 21074 in repo sonic-net/sonic-mgmt |
|
/azpw run |
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…rface_status Signed-off-by: Mihir Patel <[email protected]>
8b1ab35 to
429a38f
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@mihirpat1 can you run this test again? seem this fixed the issue? https://github.com/sonic-net/sonic-mgmt/pull/21055/files |
|
Similar fix has been merged via #21055. |
Description of PR
The
test_restart_swssis intermittently failing with "Not all interface information are detected within 300 seconds" error on some HWSKUs.This PR optimizes the SFP (transceiver) presence checking logic in
interface_utils.pyby replacing individual per-port commands with a single global command that retrieves presence status for all interfaces at once.Summary:
Fixes # (issue)
Type of change
Back port request
Approach
What is the motivation for this PR?
Fix an intermittent failure in
test_restart_swsstestcase.How did you do it?
The issue seems to stem from a significant delay between capturing the link status of a port and actually checking it. In the
check_interface_statusfunction (https://github.com/sonic-net/sonic-mgmt/blob/c7d26dbee7a15e97ccb209b8462d06f11c050d5c/tests/common/platform/interface_utils.py#L88C5-L88C27), the process is:show interface descriptionto get the current status.show interface transceiver presence EthernetXXfor each port - This takes ~1s for every portOn HWSKUs with many ports (e.g., 512 logical ports), this can lead to a situation where the oper status for the last few ports appears down simply because too much time has passed since the initial status capture. Unfortunately, by the time the loop reaches those ports, the 300s timeout (
sonic-mgmt/tests/platform_tests/test_sequential_restart.py
Line 79 in c7d26db
In order to fix this issue, a single global
show interfaces transceiver presencecommand is executed instead of per-port command.How did you verify/test it?
Ran the failing testcase and ensured that the test passes with the fix
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation
ADO - 34399249