Skip to content

Update all port pfcwd storm test to make it more stable#15870

Merged
bingwang-ms merged 1 commit intosonic-net:masterfrom
echuawu:update_pfcwd_all_port_storm
Jun 19, 2025
Merged

Update all port pfcwd storm test to make it more stable#15870
bingwang-ms merged 1 commit intosonic-net:masterfrom
echuawu:update_pfcwd_all_port_storm

Conversation

@echuawu
Copy link
Copy Markdown
Contributor

@echuawu echuawu commented Dec 4, 2024

Description of PR

Update all port pfcwd storm test to make it more stable

  1. Optimize background traffic
  2. Fix arp resolve issue
  3. Remove ptf leftovers added in pfcwd test

Summary:
Fixes # (issue)
All port pfcwd storm test not stable enough.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

Update all port pfcwd storm test is not stable engough.

How did you do it?

Optimize background traffic and arp resolvation.

How did you verify/test it?

Run it in internal regression.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@liat-grozovik
Copy link
Copy Markdown
Collaborator

@yxieca @bingwang-ms could you please review or suggest someone to review?

@bingwang-ms
Copy link
Copy Markdown
Collaborator

@kperumalbfn Can you help review if you get a chance?

@yxieca yxieca requested a review from lipxu December 18, 2024 18:46
@yxieca
Copy link
Copy Markdown
Collaborator

yxieca commented Dec 18, 2024

@lipxu please help review.

@echuawu echuawu force-pushed the update_pfcwd_all_port_storm branch from 091eb73 to ec9237d Compare December 24, 2024 03:50
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

@lipxu lipxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@echuawu echuawu force-pushed the update_pfcwd_all_port_storm branch from ec9237d to eefaf76 Compare December 24, 2024 11:13
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@echuawu
Copy link
Copy Markdown
Contributor Author

echuawu commented Dec 25, 2024

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@echuawu echuawu force-pushed the update_pfcwd_all_port_storm branch from eefaf76 to 304bb7c Compare December 25, 2024 06:11
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@echuawu
Copy link
Copy Markdown
Contributor Author

echuawu commented Jan 2, 2025

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@echuawu
Copy link
Copy Markdown
Contributor Author

echuawu commented Jan 3, 2025

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@echuawu echuawu force-pushed the update_pfcwd_all_port_storm branch from 304bb7c to 7e4a958 Compare January 3, 2025 02:24
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@echuawu
Copy link
Copy Markdown
Contributor Author

echuawu commented Jan 3, 2025

Hi @bingwang-ms , the 2 checker failure were due to testbed related issue, I had retriggered them many times, but still could not pass them. Do you know from whom could I get help?

1. DUTHOST_UNREACHABLE
2025-01-03T04:53:27.8357425Z Error type: DUTHOST_UNREACHABLE
2025-01-03T04:53:27.8358232Z Error message: DUT host unreachable for vms-kvm-dual-t0 on running test_pretest.py|||vms-kvm-dual-t0_377521
2025-01-03T04:53:27.8359427Z Operation failed with exception: Exception('Test plan id: 677764fe98ec838ee83d611c, status: FAILED, result: FAILED, Elapsed 602 seconds. Check https://elastictest.org/scheduler/testplan/677764fe98ec838ee83d611c for test plan status')

2. UPGRADE_IMAGE_FAILED
2025-01-03T04:43:14.9498342Z Error type: UPGRADE_IMAGE_FAILED
2025-01-03T04:43:14.9499389Z Error message: Prepare testbed failed: testbed_q_sonic-elastictest-prod-vmss-E8s-v3_261321 for UPGRADE_IMAGE_FAILED, testbed_id is: 67776755e396f7257aacb936, cmd is: Download image
2025-01-03T04:43:14.9500664Z Operation failed with exception: Exception('Test plan id: 677764fbb9fbfafb6a51c2e7, status: FAILED, result: FAILED, Elapsed 782 seconds. Check https://elastictest.org/scheduler/testplan/677764fbb9fbfafb6a51c2e7 for test plan status')

@echuawu
Copy link
Copy Markdown
Contributor Author

echuawu commented Jun 5, 2025

@echuawu Can you please address the comment? Feel free to let me know if an offline discussion is needed.

Hi @bingwang-ms , the key concern of you is the 100000 background packets had been changed to 500. The backgroud traffic would be sent from ports one by one. If the packet number is too large, it would lead to the full loop could not be finished within 1 second, then some ports would be failed to be detected by pfcwd. Setting the packet number to a small value would make sure pfcwd could detect storm on all the ports.

Hi @echuawu , my concern is still not addressed. If the packet number is set to 500, then ptf would be able to finish the packet sending at a very short time. It's possible that PFC pause is not sent out from leaf fanout. How can you guarantee this?

The background traffic should be received within a detect interval, it does not need to be received at the same time when PFC pause receiving. 500 packets is a proper background traffic that it could make sure all the ports could receive the background traffic(due to the background traffic was send from ptf port one by one). If set a big value, such as 100000, the send duration would be longer than 500 packets, it may lead to some ports could not detect the background traffic within a detect interval. That's what we met before, and then update the number from 100000 to 500.

@lolyu
Copy link
Copy Markdown
Collaborator

lolyu commented Jun 17, 2025

Hi @bingwang-ms, could you please help check this PR?

@bingwang-ms bingwang-ms merged commit 4a151f6 into sonic-net:master Jun 19, 2025
19 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Jun 19, 2025
)

1. Optimize background traffic
2. Optimize pfcwd polling interval
3. Fix arp resolve issue
4. Remove ptf leftovers added in pfcwd test

Change-Id: I84ee67f56b4c65c682a175515737d3a9125cfdc7
@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to 202411: #19088

mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Jun 20, 2025
)

1. Optimize background traffic
2. Optimize pfcwd polling interval
3. Fix arp resolve issue
4. Remove ptf leftovers added in pfcwd test

Change-Id: I84ee67f56b4c65c682a175515737d3a9125cfdc7
@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to 202505: #19113

mssonicbld pushed a commit that referenced this pull request Jun 20, 2025
1. Optimize background traffic
2. Optimize pfcwd polling interval
3. Fix arp resolve issue
4. Remove ptf leftovers added in pfcwd test

Change-Id: I84ee67f56b4c65c682a175515737d3a9125cfdc7
nissampa pushed a commit to nissampa/sonic-mgmt_dpu_test that referenced this pull request Aug 7, 2025
)

1. Optimize background traffic
2. Optimize pfcwd polling interval
3. Fix arp resolve issue
4. Remove ptf leftovers added in pfcwd test

Change-Id: I84ee67f56b4c65c682a175515737d3a9125cfdc7
@congh-nvidia
Copy link
Copy Markdown
Contributor

Hi @bingwang-ms we also need this in 202411, could you please help cherry-pick?
@lolyu this will fix the key error issue.

mssonicbld pushed a commit that referenced this pull request Sep 3, 2025
1. Optimize background traffic
2. Optimize pfcwd polling interval
3. Fix arp resolve issue
4. Remove ptf leftovers added in pfcwd test

Change-Id: I84ee67f56b4c65c682a175515737d3a9125cfdc7
opcoder0 pushed a commit to opcoder0/sonic-mgmt that referenced this pull request Dec 8, 2025
)

1. Optimize background traffic
2. Optimize pfcwd polling interval
3. Fix arp resolve issue
4. Remove ptf leftovers added in pfcwd test

Change-Id: I84ee67f56b4c65c682a175515737d3a9125cfdc7

Signed-off-by: opcoder0 <110003254+opcoder0@users.noreply.github.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 16, 2025
)

1. Optimize background traffic
2. Optimize pfcwd polling interval
3. Fix arp resolve issue
4. Remove ptf leftovers added in pfcwd test

Change-Id: I84ee67f56b4c65c682a175515737d3a9125cfdc7
Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Dec 16, 2025
)

1. Optimize background traffic
2. Optimize pfcwd polling interval
3. Fix arp resolve issue
4. Remove ptf leftovers added in pfcwd test

Change-Id: I84ee67f56b4c65c682a175515737d3a9125cfdc7
Signed-off-by: Aharon Malkin <amalkin@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 21, 2025
)

1. Optimize background traffic
2. Optimize pfcwd polling interval
3. Fix arp resolve issue
4. Remove ptf leftovers added in pfcwd test

Change-Id: I84ee67f56b4c65c682a175515737d3a9125cfdc7
Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Jan 13, 2026
)

1. Optimize background traffic
2. Optimize pfcwd polling interval
3. Fix arp resolve issue
4. Remove ptf leftovers added in pfcwd test

Change-Id: I84ee67f56b4c65c682a175515737d3a9125cfdc7
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Jan 26, 2026
)

1. Optimize background traffic
2. Optimize pfcwd polling interval
3. Fix arp resolve issue
4. Remove ptf leftovers added in pfcwd test

Change-Id: I84ee67f56b4c65c682a175515737d3a9125cfdc7
Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
ytzur1 pushed a commit to ytzur1/sonic-mgmt that referenced this pull request Feb 2, 2026
)

1. Optimize background traffic
2. Optimize pfcwd polling interval
3. Fix arp resolve issue
4. Remove ptf leftovers added in pfcwd test

Change-Id: I84ee67f56b4c65c682a175515737d3a9125cfdc7
Signed-off-by: Yael Tzur <ytzur@nvidia.com>
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Mar 27, 2026
)

1. Optimize background traffic
2. Optimize pfcwd polling interval
3. Fix arp resolve issue
4. Remove ptf leftovers added in pfcwd test

Change-Id: I84ee67f56b4c65c682a175515737d3a9125cfdc7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants