Skip to content

Fix #14711 cherry pick from #15850#15946

Merged
yejianquan merged 2 commits intosonic-net:202405from
auspham:austinpham/14711-cherry-pick
Dec 9, 2024
Merged

Fix #14711 cherry pick from #15850#15946
yejianquan merged 2 commits intosonic-net:202405from
auspham:austinpham/14711-cherry-pick

Conversation

@auspham
Copy link
Contributor

@auspham auspham commented Dec 9, 2024

Description of PR

Summary:
Fixes # (issue) #15850

Cherry-pick #14711 to 202405

Failure is introduced from https://github.com/sonic-net/sonic-mgmt/pull/14580/files

Currently blocking #15841

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?
#11169

How did you do it?

Add a new test case to cover "pfcwd show stats" cmd

Trigger PFC storm with action drop or forward

send_tx_egress and send_rx_ingress
execute CMD 'show pfcwd stat' and ensure the output is expected
How did you verify/test it?
https://elastictest.org/scheduler/testplan/66f268ae4216a91fd43d97b6
https://elastictest.org/scheduler/testplan/66f268880369ccd340b3ea60

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

lipxu and others added 2 commits December 9, 2024 11:35
What is the motivation for this PR?
sonic-net#11169

How did you do it?
Add a new test case to cover "pfcwd show stats" cmd

Trigger PFC storm with action drop or forward
send_tx_egress and send_rx_ingress
execute CMD 'show pfcwd stat' and ensure the output is expected
How did you verify/test it?
https://elastictest.org/scheduler/testplan/66f268ae4216a91fd43d97b6
https://elastictest.org/scheduler/testplan/66f268880369ccd340b3ea60
Signed-off-by: Austin Pham <austinpham@microsoft.com>
@yejianquan yejianquan merged commit e8ad4a8 into sonic-net:202405 Dec 9, 2024
@bingwang-ms
Copy link
Collaborator

@auspham, @yejianquan May I know why this PR changed the code for the Mellanox part? The pfcwd_function test has been not stable on Mellanox platform since last week. I'm still doing some investigation.

@bingwang-ms
Copy link
Collaborator

After some investigation, I think the change on the Mellanox part caused some issue.
I raised a PR to stabilize the test on Mellanox #14467
The code below is to ensure there is PFCWD triggered on new queues before checking syslog.

pytest_assert(wait_until(PFC_STORM_TIMEOUT, 2, 0,
lambda: check_pfc_storm_state(dut, port, self.storm_hndle.pfc_queue_idx) != pfcwd_stats_before_test), # noqa: E501, E128
"PFC storm state did not change as expected") # noqa: E127

I will raise a PR to revert the Mellanox related change. Please comment if you have any concern

@auspham
Copy link
Contributor Author

auspham commented Dec 17, 2024

@auspham, @yejianquan May I know why this PR changed the code for the Mellanox part? The pfcwd_function test has been not stable on Mellanox platform since last week. I'm still doing some investigation.

Hi @bingwang-ms, this is to cherry-pick #14711 since the branch has some conflict that cannot auto-resolve.

The purpose is to merge in #15841.

@bingwang-ms
Copy link
Collaborator

@auspham , @yejianquan Can you please help review PR? Please help check the change doesn't cause regression on chassis. I didn't get a testbed to verify.
For master branch #16118
For 202405 branch #16117

yejianquan pushed a commit that referenced this pull request Dec 18, 2024
Description of PR
Summary:
This PR is to fix test issue caused by PR #15946

The code below is to ensure there is PFCWD triggered on new queues before checking syslog.

sonic-mgmt/tests/pfcwd/test_pfcwd_function.py

Lines 739 to 741 in 15bf903

 pytest_assert(wait_until(PFC_STORM_TIMEOUT, 2, 0, 
                         lambda: check_pfc_storm_state(dut, port, self.storm_hndle.pfc_queue_idx) != pfcwd_stats_before_test),  # noqa: E501, E128 
                         "PFC storm state did not change as expected")  # noqa: E127 
The test is flaky after change in #15946 because the wait time is not enough for PFCWD to be triggered on new queues.

Approach
What is the motivation for this PR?
This PR is to fix test issue caused by PR #15946

How did you do it?
Change the logic back while maintain compatibility with chassis.

How did you verify/test it?
The change is verified on a Mellanox platform.

collected 5 items                                                                                                                                                                                     

pfcwd/test_pfcwd_function.py::TestPfcwdFunc::test_pfcwd_actions[str2-msn4600c-acs-04]  ^H ^H ^H ^HPASSED                                                                                                    [ 20%]
pfcwd/test_pfcwd_function.py::TestPfcwdFunc::test_pfcwd_multi_port[str2-msn4600c-acs-04]  ^H ^H ^HPASSED                                                                                                 [ 40%]
pfcwd/test_pfcwd_function.py::TestPfcwdFunc::test_pfcwd_mmu_change[str2-msn4600c-acs-04]  ^H ^HPASSED                                                                                                 [ 60%]
pfcwd/test_pfcwd_function.py::TestPfcwdFunc::test_pfcwd_port_toggle[str2-msn4600c-acs-04]  ^H ^H ^HPASSED                                                                                                [ 80%]
pfcwd/test_pfcwd_function.py::TestPfcwdFunc::test_pfcwd_no_traffic[str2-msn4600c-acs-04] SKIPPED (This test is applicable only for cisco-8000 / Pfcwd tests skipped on m0/mx testbed.)          [100%]
Any platform specific information?
Mellanox platform specific.

co-authorized by: jianquanye@microsoft.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants