[Nvidia] Fix qos sai test for supporting LAG port by JibinBao · Pull Request #9587 · sonic-net/sonic-mgmt

JibinBao · 2023-08-22T07:32:09Z

Description of PR

Previously, on Nvidia devices, to make port congestion, we disable the port. So, if a LAG port is used as the tx port in the test, the LAG port will go down in 90s because the lacp pdus are also blocked, which will fail the tests. Therefore, we skip the test on the LAG port for QoS tests. Currently, to make the LAG port also support QoS test, we block the data plane queue instead of disabling the port. This change will work for all topo on Nvidia devices.

Summary:
Fixes # (issue)

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Back port request

201911
202012
202205

Approach

What is the motivation for this PR?

Make Qos sai test support LAG port

How did you do it?

we block the data plane queue instead of disabling the port. So the control plane still work normally.

How did you verify/test it?

Run qos sai test on t1-lag-64 topo

Any platform specific information?

Any

Supported testbed topology if it's a new test case?

Any

Documentation

mssonicbld · 2023-08-22T07:33:13Z

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed fix end of files.........................................................Passed check yaml...........................................(no files to check)Skipped check for added large files..............................................Passed check python ast.........................................................Passed flake8...................................................................Failed - hook id: flake8 - exit code: 1 tests/qos/qos_sai_base.py:1922:43: F541 f-string is missing placeholders flake8...............................................(no files to check)Skipped check conditional mark sort..........................(no files to check)Skipped

To run the pre-commit checks locally, you can follow below steps:

Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
docker container.
Ensure that the pre-commit package is installed:

sudo pip install pre-commit

Go to repository root folder
Install the pre-commit hooks:

pre-commit install

Use pre-commit to check staged file:

pre-commit

Alternatively, you can check committed files using:

pre-commit run --from-ref <commit_id> --to-ref <commit_id>

bingwang-ms · 2023-08-22T16:48:17Z

Thanks for the improvement! The code looks good to me.
Just one question, the LACP PDU is egressed from Queue 7 if I recall correctly. Is there a chance we blocked Queue 7 in the test?

tests/saitests/py3/sai_base_test.py

bingwang-ms · 2023-08-23T20:24:51Z

@stephenxs Can you resolve the conflict?

mssonicbld · 2023-08-24T02:01:34Z

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed fix end of files.........................................................Passed check yaml...........................................(no files to check)Skipped check for added large files..............................................Passed check python ast.........................................................Passed flake8...................................................................Failed - hook id: flake8 - exit code: 1 tests/saitests/py3/sai_base_test.py:46:1: E302 expected 2 blank lines, found 1 flake8...............................................(no files to check)Skipped check conditional mark sort..........................(no files to check)Skipped

To run the pre-commit checks locally, you can follow below steps:

Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
docker container.
Ensure that the pre-commit package is installed:

sudo pip install pre-commit

Go to repository root folder
Install the pre-commit hooks:

pre-commit install

Use pre-commit to check staged file:

pre-commit

Alternatively, you can check committed files using:

pre-commit run --from-ref <commit_id> --to-ref <commit_id>

JibinBao · 2023-08-24T02:20:04Z

/azpw run Azure.sonic-mgmt

mssonicbld · 2023-08-24T02:20:06Z

/AzurePipelines run Azure.sonic-mgmt

azure-pipelines · 2023-08-24T02:20:14Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2023-08-24T02:21:10Z

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed fix end of files.........................................................Passed check yaml...........................................(no files to check)Skipped check for added large files..............................................Passed check python ast.........................................................Passed flake8...................................................................Failed - hook id: flake8 - exit code: 1 tests/saitests/py3/sai_base_test.py:46:1: E302 expected 2 blank lines, found 1 flake8...............................................(no files to check)Skipped check conditional mark sort..........................(no files to check)Skipped

To run the pre-commit checks locally, you can follow below steps:

Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
docker container.
Ensure that the pre-commit package is installed:

sudo pip install pre-commit

Go to repository root folder
Install the pre-commit hooks:

pre-commit install

Use pre-commit to check staged file:

pre-commit

Alternatively, you can check committed files using:

pre-commit run --from-ref <commit_id> --to-ref <commit_id>

JibinBao · 2023-08-24T06:10:02Z

Hi @bingwang-ms , the conflict has been resolved. Can you help merge it?

1. Because the method to block port is replaced by blocking queue, on the SN5600 platform, the first packet will leak, so update the case accordingly. 2. Because the method to block port is replaced by blocking queue, On dualtor devices, data plane has been blocked, but control plane still can work, so the old implementation will view the control plane packets as leaked packets. Actually, the leaked packet number is 0.

JibinBao · 2023-08-29T01:34:16Z

/azpw run Azure.sonic-mgmt

mssonicbld · 2023-08-29T01:34:17Z

/AzurePipelines run Azure.sonic-mgmt

mssonicbld · 2023-09-01T17:04:57Z

@JibinBao PR conflicts with 202205 branch

bingwang-ms · 2023-09-01T17:05:48Z

@JibinBao Can you please fix the conflict for 202205 branch? Thanks

JibinBao · 2023-09-04T01:19:01Z

@JibinBao Can you please fix the conflict for 202205 branch? Thanks

Ok, will do

Description of PR Previously, on Nvidia devices, to make port congestion, we disable the port. So, if a LAG port is used as the tx port in the test, the LAG port will go down in 90s because the lacp pdus are also blocked, which will fail the tests. Therefore, we skip the test on the LAG port for QoS tests. Currently, to make the LAG port also support QoS test, we block the data plane queue instead of disabling the port. This change will work for all topo on Nvidia devices.

mssonicbld · 2023-09-15T03:16:54Z

Cherry-pick PR to 202305: #9996

Description of PR Previously, on Nvidia devices, to make port congestion, we disable the port. So, if a LAG port is used as the tx port in the test, the LAG port will go down in 90s because the lacp pdus are also blocked, which will fail the tests. Therefore, we skip the test on the LAG port for QoS tests. Currently, to make the LAG port also support QoS test, we block the data plane queue instead of disabling the port. This change will work for all topo on Nvidia devices.

bingwang-ms · 2023-09-20T06:12:55Z

@JibinBao Can you please file a new PR for 202205 branch? I saw the test cases are still skipped in 202205 branch. Thanks!

JibinBao · 2023-09-20T06:15:27Z

@JibinBao Can you please file a new PR for 202205 branch? I saw the test cases are still skipped in 202205 branch. Thanks!

@bingwang-ms , After testing psss on 202205, I will open one new PR for 202205.

Description of PR Previously, on Nvidia devices, to make port congestion, we disable the port. So, if a LAG port is used as the tx port in the test, the LAG port will go down in 90s because the lacp pdus are also blocked, which will fail the tests. Therefore, we skip the test on the LAG port for QoS tests. Currently, to make the LAG port also support QoS test, we block the data plane queue instead of disabling the port. This change will work for all topo on Nvidia devices. Change-Id: I6580398b6038e6a850915c57dc6112cdb628ed99

JibinBao · 2023-09-26T01:52:59Z

@JibinBao Can you please file a new PR for 202205 branch? I saw the test cases are still skipped in 202205 branch. Thanks!
PR #10121 for 202205 is ready. Can you help review it?

Description of PR Previously, on Nvidia devices, to make port congestion, we disable the port. So, if a LAG port is used as the tx port in the test, the LAG port will go down in 90s because the lacp pdus are also blocked, which will fail the tests. Therefore, we skip the test on the LAG port for QoS tests. Currently, to make the LAG port also support QoS test, we block the data plane queue instead of disabling the port. This change will work for all topo on Nvidia devices. Change-Id: I6580398b6038e6a850915c57dc6112cdb628ed99

XuChen-MSFT · 2023-09-29T15:01:25Z

tests/qos/qos_sai_base.py


        # LAG ports in T1 TOPO need to be removed in Mellanox devices
-        if topo in self.SUPPORTED_T0_TOPOS or isMellanoxDevice(src_dut):
+        if topo in self.SUPPORTED_T0_TOPOS or (topo in self.SUPPORTED_PTF_TOPOS and isMellanoxDevice(src_dut)):


In the past, t0-120 topo was correctly processed because it is mellanox device.
Now, since t0-120 is not in ptf_topo, need to add t0-120 topo to SUPPORTED_T0_TOPOS.
please help to review PR #10200

Description of PR Previously, on Nvidia devices, to make port congestion, we disable the port. So, if a LAG port is used as the tx port in the test, the LAG port will go down in 90s because the lacp pdus are also blocked, which will fail the tests. Therefore, we skip the test on the LAG port for QoS tests. Currently, to make the LAG port also support QoS test, we block the data plane queue instead of disabling the port. This change will work for all topo on Nvidia devices.

JibinBao requested review from XuChen-MSFT and wsycqyz as code owners August 22, 2023 07:32

JibinBao force-pushed the fix_qos_lag_issue branch from 7eacd16 to 3e7335f Compare August 22, 2023 07:45

stephenxs approved these changes Aug 22, 2023

View reviewed changes

stephenxs requested a review from bingwang-ms August 22, 2023 08:45

JibinBao mentioned this pull request Aug 22, 2023

[202205] Enable QoS sai test on T1-Lag topology #8792

Draft

6 tasks

bingwang-ms previously approved these changes Aug 22, 2023

View reviewed changes

bingwang-ms added Request for 202205 branch Request for 202305 branch labels Aug 22, 2023

bingwang-ms reviewed Aug 22, 2023

View reviewed changes

tests/saitests/py3/sai_base_test.py Outdated Show resolved Hide resolved

JibinBao dismissed bingwang-ms’s stale review via 1041870 August 24, 2023 02:00

JibinBao force-pushed the fix_qos_lag_issue branch from 3e7335f to 1041870 Compare August 24, 2023 02:00

bingwang-ms previously approved these changes Aug 24, 2023

View reviewed changes

Fix qos lag issue

f35ef26

JibinBao dismissed bingwang-ms’s stale review via f35ef26 August 24, 2023 03:13

JibinBao force-pushed the fix_qos_lag_issue branch from 1041870 to f35ef26 Compare August 24, 2023 03:13

JibinBao added 3 commits August 24, 2023 17:25

fix: support ptf topo for mellanox device

79b69d0

Fix test_tunnel_qos_remap issue, add dut_usename and dut_password

ca34b22

yxieca added the Approved for 202205 branch label Sep 1, 2023

mssonicbld added the Cherry Pick Conflict_202205 label Sep 1, 2023

wangxin added the Approved for 202305 branch label Sep 15, 2023

mssonicbld added the Created PR to 202305 branch label Sep 15, 2023

mssonicbld mentioned this pull request Sep 15, 2023

[action] [PR:9587] [Nvidia] Fix qos sai test for supporting LAG port #9996

Merged

6 tasks

mssonicbld added Included in 202305 branch and removed Created PR to 202305 branch labels Sep 15, 2023

JibinBao mentioned this pull request Sep 25, 2023

[202205 | Nvidia] Fix qos sai test for supporting LAG port (#9587) #10121

Merged

6 tasks

yxieca added the Included in 202205 branch label Sep 27, 2023

XuChen-MSFT mentioned this pull request Sep 29, 2023

[qos] support t0-120 qos sai test #10200

Merged

7 tasks

XuChen-MSFT reviewed Sep 29, 2023

View reviewed changes

bingwang-ms mentioned this pull request Oct 11, 2023

[Mellanox] testQosSaiDwrr failed on Mellanox platform #10299

Closed

This was referenced Oct 18, 2023

[action] [PR:10200] [qos] support t0-120 qos sai test #10367

Merged

[action] [PR:10200] [qos] support t0-120 qos sai test #10368

Merged

congh-nvidia mentioned this pull request Oct 24, 2023

[Nvidia] Fix dscp remapping cases #10430

Merged

7 tasks

This was referenced Feb 1, 2024

[action] [PR:10430] [Nvidia] Fix dscp remapping cases #11492

Merged

[action] [PR:10430] [Nvidia] Fix dscp remapping cases #11493

Merged

Conversation

JibinBao commented Aug 22, 2023

Description of PR

Type of change

Back port request

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Uh oh!

mssonicbld commented Aug 22, 2023

Uh oh!

bingwang-ms commented Aug 22, 2023

Uh oh!

Uh oh!

bingwang-ms commented Aug 23, 2023

Uh oh!

mssonicbld commented Aug 24, 2023

Uh oh!

JibinBao commented Aug 24, 2023

Uh oh!

mssonicbld commented Aug 24, 2023

Uh oh!

azure-pipelines bot commented Aug 24, 2023

Uh oh!

mssonicbld commented Aug 24, 2023

Uh oh!

JibinBao commented Aug 24, 2023

Uh oh!

JibinBao commented Aug 29, 2023

Uh oh!

mssonicbld commented Aug 29, 2023

Uh oh!

mssonicbld commented Sep 1, 2023

Uh oh!

bingwang-ms commented Sep 1, 2023

Uh oh!

JibinBao commented Sep 4, 2023

Uh oh!

mssonicbld commented Sep 15, 2023

Uh oh!

bingwang-ms commented Sep 20, 2023

Uh oh!

JibinBao commented Sep 20, 2023

Uh oh!

JibinBao commented Sep 26, 2023

Uh oh!

XuChen-MSFT Sep 29, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants