Skip to content

[Qos]HeadroomPoolSize test with dynamic_threshold based buffer allocation #16885

Merged
yxieca merged 10 commits intosonic-net:masterfrom
ansrajpu-git:HdrmPool_dyBuffer
Apr 22, 2025
Merged

[Qos]HeadroomPoolSize test with dynamic_threshold based buffer allocation #16885
yxieca merged 10 commits intosonic-net:masterfrom
ansrajpu-git:HdrmPool_dyBuffer

Conversation

@ansrajpu-git
Copy link
Contributor

@ansrajpu-git ansrajpu-git commented Feb 10, 2025

Description of PR

The existing HeadroomPool size test is based on traditional way of buffer allocation to calculate the headroompoolsize.
For Broadcom chassis, this new test calculates the buffer allocation using dynamic threshold to fill shared pool & headroom and verify the headroom size per vsq.

Skip test :-testQosSaiHeadroomPoolWatermark - SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES is not supported on broadcom-dnx
Summary:
For Broadcom chassis, since the headroom per vsq is huge, 4 src port are used to fill the headroom.

Fixes # (issue)
Issue : #13503

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405
  • 202411

Approach

What is the motivation for this PR?

Verifying HeadroomPoolsize test for broadcom-dnx chassis

How did you do it?

How did you verify/test it?

Executed qos test for headroomPool size and verify the results
Resource allocation
image
max headroom per vsq
image

image

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ansrajpu-git
Copy link
Contributor Author

@vmittal-msft , please review.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Pull request contains merge conflicts.

Comment on lines -454 to +457
pkts_num_trig_pfc: 657523
pkts_num_hdrm_full: 622850
pkts_num_hdrm_partial: 622750
margin: 300
pkts_num_trig_pfc: 542448
pkts_num_hdrm_full: 713397
pkts_num_hdrm_partial: 713250
margin: 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share details on how you calculated these values so we can do a similar calculation for qos_params.j2.yaml?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share details on how you calculated these values so we can do a similar calculation for qos_params.j2.yaml?

@ansrajpu-git if you might help reply? thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share details on how you calculated these values so we can do a similar calculation for qos_params.j2.yaml?

@arista-nwolfe , the calculation is based on the Pool_size and the alpha value (dynamic threshold) used.
The pkts_num_trig_pfc is calculated as = 2** (alpha value) x R , where R is the remaining pool_ size
The pkts_num_headroom_full is the same value as buffer allocation per vsq.
Above value calculated has to be divided by the packet size used in test, to place in qos_yaml file.

@arista-nwolfe
Copy link
Contributor

Also note that on a T2 full topology with all J2c+ LCs I saw the tests skipped due to Insufficient number of ports to fill the headroom. Is this test meant for a different topology with more ports?

qos/test_qos_sai.py::TestQosSai::testQosSaiHeadroomPoolSize[single_asic] SKIPPED (Insufficient number of ports to fill the headroom)                                       [ 20%]
qos/test_qos_sai.py::TestQosSai::testQosSaiHeadroomPoolSize[single_dut_multi_asic] SKIPPED (Insufficient number of ports to fill the headroom)                             [ 40%]
qos/test_qos_sai.py::TestQosSai::testQosSaiHeadroomPoolSize[multi_dut_longlink_to_shortlink] SKIPPED (Insufficient number of ports to fill the headroom)                   [ 60%]
qos/test_qos_sai.py::TestQosSai::testQosSaiHeadroomPoolSize[multi_dut_shortlink_to_shortlink] SKIPPED (Insufficient number of ports to fill the headroom)                  [ 80%]
qos/test_qos_sai.py::TestQosSai::testQosSaiHeadroomPoolSize[multi_dut_shortlink_to_longlink] SKIPPED (Insufficient number of ports to fill the headroom)                   [100%]

Copy link
Contributor

@vmittal-msft vmittal-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ansrajpu-git It looks good to me. Please handle minor comments as discussed on call. Also, please re-base.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Pull request contains merge conflicts.

@ansrajpu-git
Copy link
Contributor Author

Also note that on a T2 full topology with all J2c+ LCs I saw the tests skipped due to Insufficient number of ports to fill the headroom. Is this test meant for a different topology with more ports?

qos/test_qos_sai.py::TestQosSai::testQosSaiHeadroomPoolSize[single_asic] SKIPPED (Insufficient number of ports to fill the headroom)                                       [ 20%]
qos/test_qos_sai.py::TestQosSai::testQosSaiHeadroomPoolSize[single_dut_multi_asic] SKIPPED (Insufficient number of ports to fill the headroom)                             [ 40%]
qos/test_qos_sai.py::TestQosSai::testQosSaiHeadroomPoolSize[multi_dut_longlink_to_shortlink] SKIPPED (Insufficient number of ports to fill the headroom)                   [ 60%]
qos/test_qos_sai.py::TestQosSai::testQosSaiHeadroomPoolSize[multi_dut_shortlink_to_shortlink] SKIPPED (Insufficient number of ports to fill the headroom)                  [ 80%]
qos/test_qos_sai.py::TestQosSai::testQosSaiHeadroomPoolSize[multi_dut_shortlink_to_longlink] SKIPPED (Insufficient number of ports to fill the headroom)                   [100%]

@arista-nwolfe , For broadcom-dnx, T2 topology ,buffer profile except "400000_120000m" has a smaller headroom per vsq , which is not sufficient to fill the Nominal headroom with available ports.For ex., 400000_2000m has a nominal headroom of ~791MB but the headroom per vsq is ~1.44MB. Upon calculating, 273 ports are needed approximately to fill the headroom.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ansrajpu-git
Copy link
Contributor Author

  • Required

@vmittal-msft , comments update and rebase done.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ansrajpu-git
Copy link
Contributor Author

Cherry-pick -PR-Azure/sonic-mgmt.msft#191 raised for msft-202405 branch

@XuChen-MSFT
Copy link
Contributor

@ansrajpu-git
original "HdrmPoolSizeTest" is also support device which is dynamically allocate buffer. and I think dynamic buffer allocation should be default behaviors.
why need add new case ? I just quicky go throught "HdrmPoolSizeTest_withDynamicBufferCacl", I don't found some big difference with original one "HdrmPoolSizeTest" .
The original case is to automatically select the ports and PGs required for the hdrm test in qos_param_generator.py, calculate how much buffer each PG needs to fill the shared buffer according to the dynamic threshold, and then start the consumption test of the headroom pool.

@ansrajpu-git
Copy link
Contributor Author

ansrajpu-git commented Apr 16, 2025

@ansrajpu-git original "HdrmPoolSizeTest" is also support device which is dynamically allocate buffer. and I think dynamic buffer allocation should be default behaviors. why need add new case ? I just quicky go throught "HdrmPoolSizeTest_withDynamicBufferCacl", I don't found some big difference with original one "HdrmPoolSizeTest" . The original case is to automatically select the ports and PGs required for the hdrm test in qos_param_generator.py, calculate how much buffer each PG needs to fill the shared buffer according to the dynamic threshold, and then start the consumption test of the headroom pool.

@XuChen-MSFT , The new test has similar test steps to previous test 'HdrmPoolSizeTest' but it was working with multiple check condition for broadcom-dnx platform. With the dynamic buffer allocation, the test was intermittently failing, and we see that there is some adjustment need between the shared pool almost filled and trigger pfc frames ,precisely with few more packets.
As suggested by Vineet, we came up with new test specifically to broadcom-dnx platform for a clean look. We will be raising another PR for removing the broadcom-dnx check from the older test, once this PR is merged.
@vmittal-msft,please comment

@vmittal-msft
Copy link
Contributor

@yxieca @rlhui please help merge.

@vmittal-msft
Copy link
Contributor

@XuChen-MSFT we can discuss this if you have any concern. please ping.

@yxieca yxieca merged commit 0247297 into sonic-net:master Apr 22, 2025
20 checks passed
@ansrajpu-git
Copy link
Contributor Author

Cherry-pick for msft-202503 -Azure/sonic-mgmt.msft#236

auspham pushed a commit to auspham/sonic-mgmt that referenced this pull request May 30, 2025
…tion (sonic-net#16885)

What is the motivation for this PR?
Verifying HeadroomPoolsize test for broadcom-dnx chassis

How did you verify/test it?
Executed qos test for headroomPool size and verify the results
auspham pushed a commit to auspham/sonic-mgmt that referenced this pull request May 30, 2025
…hold based buffer allocation (sonic-net#236)

Cherry-pick  (sonic-net#16885)

What is the motivation for this PR?
Verifying HeadroomPoolsize test for broadcom-dnx chassis

How did you verify/test it?
Executed qos test for headroomPool size and verify the results
opcoder0 pushed a commit to opcoder0/sonic-mgmt that referenced this pull request Dec 8, 2025
…tion (sonic-net#16885)

What is the motivation for this PR?
Verifying HeadroomPoolsize test for broadcom-dnx chassis

How did you verify/test it?
Executed qos test for headroomPool size and verify the results

Signed-off-by: opcoder0 <110003254+opcoder0@users.noreply.github.com>
AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Dec 16, 2025
…tion (sonic-net#16885)

What is the motivation for this PR?
Verifying HeadroomPoolsize test for broadcom-dnx chassis

How did you verify/test it?
Executed qos test for headroomPool size and verify the results

Signed-off-by: Aharon Malkin <amalkin@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 21, 2025
…tion (sonic-net#16885)

What is the motivation for this PR?
Verifying HeadroomPoolsize test for broadcom-dnx chassis

How did you verify/test it?
Executed qos test for headroomPool size and verify the results

Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Jan 26, 2026
…tion (sonic-net#16885)

What is the motivation for this PR?
Verifying HeadroomPoolsize test for broadcom-dnx chassis

How did you verify/test it?
Executed qos test for headroomPool size and verify the results

Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

9 participants