Skip to content

Temporarily disable bulk init requests for PORT counters#3843

Merged
yejianquan merged 3 commits intosonic-net:202505from
rajkumar1-arista:temp-disable-bulk-req-for-port-cntrs
Sep 3, 2025
Merged

Temporarily disable bulk init requests for PORT counters#3843
yejianquan merged 3 commits intosonic-net:202505from
rajkumar1-arista:temp-disable-bulk-req-for-port-cntrs

Conversation

@rajkumar1-arista
Copy link
Copy Markdown

@rajkumar1-arista rajkumar1-arista commented Aug 22, 2025

Add temporary fix for https://github.com/aristanetworks/sonic-qual.msft/issues/655

This forces each port to be processed individually, avoiding capability mismatch between different ports in bulk requests

What I did
Temporarily disable bulk init requests for PORT counters.

Why I did it
When swss requests bulk initialization of PORT counters, corresponding component in sonic-sairedis assumes all the requested ports support same attributes, which is not the case for SFP/mgmt ports of Arista switches and was causing these ports to be completely skipped. This is supposed to be fixed by Azure/sonic-sairedis.msft#73 but it needs a re-work as its breaking non-Broadcom platform.

So, we're temporarily disabling this flow.

How I verified it
Verified countersDB is now having all the supported counters for SFP ports.

Details if related
relevant threads: #558, #629, 655

Add temporary fix for aristanetworks/sonic-qual.msft#655

This forces each port to be processed individually, avoiding capability mismatches between SFP ports and regular ports in bulk requests
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@rajkumar1-arista rajkumar1-arista marked this pull request as draft August 22, 2025 13:48
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@rajkumar1-arista rajkumar1-arista marked this pull request as ready for review August 25, 2025 08:31
@rajkumar1-arista
Copy link
Copy Markdown
Author

Some unrelated mock tests are failing, those failures can be seen without the change as well, ref dummy PR

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@rajkumar1-arista
Copy link
Copy Markdown
Author

mock test MuxRollbackTest.StandbyToActiveExceptionRollbackToStandby is failing irrespective of change, ref #3844

@lolyu
Copy link
Copy Markdown
Contributor

lolyu commented Aug 29, 2025

Hi @rajkumar1-arista, could you please rebase your working branch to see if the PR test passes?

@rajkumar1-arista
Copy link
Copy Markdown
Author

Hi @lolyu, working branch is already up to date with 202505

@StormLiangMS StormLiangMS reopened this Sep 1, 2025
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@StormLiangMS
Copy link
Copy Markdown
Contributor

hi @rajkumar1-arista could you check the build failure?

@rajkumar1-arista
Copy link
Copy Markdown
Author

@StormLiangMS mock test MuxRollbackTest.StandbyToActiveExceptionRollbackToStandby is failing irrespective of change, ref #3844

@lipxu lipxu self-requested a review September 2, 2025 00:33
Copy link
Copy Markdown

@lipxu lipxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR.
1: Based on the test result, it looks likes the issue was introduced around 21-Apr-2025, do you know what might have triggered it? thanks
2: From the PR description, it seems this PR is a workaround, not the finial fix? if so, could you please share the plan or the timeline for final fix. thanks

@rajkumar1-arista
Copy link
Copy Markdown
Author

@lipxu

1: Based on the test result, it looks likes the issue was introduced around 21-Apr-2025, do you know what might have triggered it? thanks

This was introduced by sonic-net/sonic-sairedis#1527, but might have been masked and further improvements might have triggered it around the mentioned timeline.

From the PR description, it seems this PR is a workaround, not the finial fix? if so, could you please share the plan or the timeline for final fix. thanks

Yes this is a workaround, the final fix by @andywongarista Azure/sonic-sairedis.msft#73 was reverted due to some breakages on non-broadcom platform. @andywongarista can you please share the plan to modify your fix?

@rajkumar1-arista
Copy link
Copy Markdown
Author

Waiting fo #3855 to get merged in 202505, post that I'll rebase my change.

@andywongarista
Copy link
Copy Markdown
Contributor

@lipxu

1: Based on the test result, it looks likes the issue was introduced around 21-Apr-2025, do you know what might have triggered it? thanks

This was introduced by sonic-net/sonic-sairedis#1527, but might have been masked and further improvements might have triggered it around the mentioned timeline.

From the PR description, it seems this PR is a workaround, not the finial fix? if so, could you please share the plan or the timeline for final fix. thanks

Yes this is a workaround, the final fix by @andywongarista Azure/sonic-sairedis.msft#73 was reverted due to some breakages on non-broadcom platform. @andywongarista can you please share the plan to modify your fix?

This fix was not by me but by @justin-wong-ce, @justin-wong-ce can you comment?

@theasianpianist
Copy link
Copy Markdown
Contributor

@rajkumar1-arista can you temporarily add the following change to the failing unit test? Having orchagent logs will help debug the test failure:

diff --git a/tests/mock_tests/mux_rollback_ut.cpp b/tests/mock_tests/mux_rollback_ut.cpp
index 06349fed..e329865e 100644
--- a/tests/mock_tests/mux_rollback_ut.cpp
+++ b/tests/mock_tests/mux_rollback_ut.cpp
@@ -279,8 +279,11 @@ namespace mux_rollback_test
     {
         EXPECT_CALL(*mock_sai_next_hop_api, create_next_hops)
             .WillOnce(Throw(exception()));
+        swss::Logger::swssOutputNotify(std::string(), "STDOUT");
+        swss::Logger::setMinPrio(swss::Logger::SWSS_INFO);
         SetMuxStateFromAppDb(ACTIVE_STATE);
         EXPECT_EQ(STANDBY_STATE, m_MuxCable->getState());
+        swss::Logger::swssOutputNotify(std::string(), "SYSLOG");
     }

     TEST_F(MuxRollbackTest, ActiveToStandbyExceptionRollbackToActive)

@yejianquan
Copy link
Copy Markdown

The 202505 cherry-pick of #3788 passes all the PR build and test,
#3855
Should we merge it to fix the 202505 build and test issue? @theasianpianist

@StormLiangMS StormLiangMS reopened this Sep 3, 2025
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@yejianquan
Copy link
Copy Markdown

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

@StormLiangMS StormLiangMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yejianquan yejianquan merged commit c0d4be0 into sonic-net:202505 Sep 3, 2025
21 checks passed
@justin-wong-ce
Copy link
Copy Markdown

justin-wong-ce commented Sep 3, 2025

@lipxu

1: Based on the test result, it looks likes the issue was introduced around 21-Apr-2025, do you know what might have triggered it? thanks

This was introduced by sonic-net/sonic-sairedis#1527, but might have been masked and further improvements might have triggered it around the mentioned timeline.

From the PR description, it seems this PR is a workaround, not the finial fix? if so, could you please share the plan or the timeline for final fix. thanks

Yes this is a workaround, the final fix by @andywongarista Azure/sonic-sairedis.msft#73 was reverted due to some breakages on non-broadcom platform. @andywongarista can you please share the plan to modify your fix?

The final fix has some set back as the issue now involves other vendors and various trade offs need to be weighed. The fix will probably need a HLD review and the community to discuss in order to be able to decide on an approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants