Skip to content

[Nvidia] Fix qos sai test for supporting LAG port #9587

Merged
bingwang-ms merged 4 commits intosonic-net:masterfrom
JibinBao:fix_qos_lag_issue
Aug 30, 2023
Merged

[Nvidia] Fix qos sai test for supporting LAG port #9587
bingwang-ms merged 4 commits intosonic-net:masterfrom
JibinBao:fix_qos_lag_issue

Conversation

@JibinBao
Copy link
Contributor

Description of PR

Previously, on Nvidia devices, to make port congestion, we disable the port. So, if a LAG port is used as the tx port in the test, the LAG port will go down in 90s because the lacp pdus are also blocked, which will fail the tests. Therefore, we skip the test on the LAG port for QoS tests. Currently, to make the LAG port also support QoS test, we block the data plane queue instead of disabling the port. This change will work for all topo on Nvidia devices.

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 201911
  • 202012
  • 202205

Approach

What is the motivation for this PR?

Make Qos sai test support LAG port

How did you do it?

we block the data plane queue instead of disabling the port. So the control plane still work normally.

How did you verify/test it?

Run qos sai test on t1-lag-64 topo

Any platform specific information?

Any

Supported testbed topology if it's a new test case?

Any

Documentation

@mssonicbld
Copy link
Collaborator

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/qos/qos_sai_base.py:1922:43: F541 f-string is missing placeholders

flake8...............................................(no files to check)Skipped
check conditional mark sort..........................(no files to check)Skipped

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

@bingwang-ms
Copy link
Collaborator

Thanks for the improvement! The code looks good to me.
Just one question, the LACP PDU is egressed from Queue 7 if I recall correctly. Is there a chance we blocked Queue 7 in the test?

bingwang-ms
bingwang-ms previously approved these changes Aug 22, 2023
@bingwang-ms
Copy link
Collaborator

@stephenxs Can you resolve the conflict?

@mssonicbld
Copy link
Collaborator

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/saitests/py3/sai_base_test.py:46:1: E302 expected 2 blank lines, found 1

flake8...............................................(no files to check)Skipped
check conditional mark sort..........................(no files to check)Skipped

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

bingwang-ms
bingwang-ms previously approved these changes Aug 24, 2023
@JibinBao
Copy link
Contributor Author

/azpw run Azure.sonic-mgmt

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-mgmt

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/saitests/py3/sai_base_test.py:46:1: E302 expected 2 blank lines, found 1

flake8...............................................(no files to check)Skipped
check conditional mark sort..........................(no files to check)Skipped

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

@JibinBao
Copy link
Contributor Author

Hi @bingwang-ms , the conflict has been resolved. Can you help merge it?

1. Because the method to block port is replaced by blocking queue, on the SN5600 platform, the first packet will leak, so update the case accordingly.
2. Because the method to block port is replaced by blocking queue, On dualtor devices, data plane has been blocked,  but control plane still can work, so the old implementation will view the control plane packets as leaked packets. Actually, the leaked packet number is 0.
@JibinBao
Copy link
Contributor Author

/azpw run Azure.sonic-mgmt

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-mgmt

@mssonicbld
Copy link
Collaborator

@JibinBao PR conflicts with 202205 branch

@bingwang-ms
Copy link
Collaborator

@JibinBao Can you please fix the conflict for 202205 branch? Thanks

@JibinBao
Copy link
Contributor Author

JibinBao commented Sep 4, 2023

@JibinBao Can you please fix the conflict for 202205 branch? Thanks

Ok, will do

mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Sep 15, 2023
Description of PR
Previously, on Nvidia devices, to make port congestion, we disable the port. So, if a LAG port is used as the tx port in the test, the LAG port will go down in 90s because the lacp pdus are also blocked, which will fail the tests. Therefore, we skip the test on the LAG port for QoS tests. Currently, to make the LAG port also support QoS test, we block the data plane queue instead of disabling the port. This change will work for all topo on Nvidia devices.
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202305: #9996

mssonicbld pushed a commit that referenced this pull request Sep 15, 2023
Description of PR
Previously, on Nvidia devices, to make port congestion, we disable the port. So, if a LAG port is used as the tx port in the test, the LAG port will go down in 90s because the lacp pdus are also blocked, which will fail the tests. Therefore, we skip the test on the LAG port for QoS tests. Currently, to make the LAG port also support QoS test, we block the data plane queue instead of disabling the port. This change will work for all topo on Nvidia devices.
@bingwang-ms
Copy link
Collaborator

@JibinBao Can you please file a new PR for 202205 branch? I saw the test cases are still skipped in 202205 branch. Thanks!

@JibinBao
Copy link
Contributor Author

@JibinBao Can you please file a new PR for 202205 branch? I saw the test cases are still skipped in 202205 branch. Thanks!

@bingwang-ms , After testing psss on 202205, I will open one new PR for 202205.

JibinBao added a commit to JibinBao/sonic-mgmt that referenced this pull request Sep 25, 2023
Description of PR
Previously, on Nvidia devices, to make port congestion, we disable the port. So, if a LAG port is used as the tx port in the test, the LAG port will go down in 90s because the lacp pdus are also blocked, which will fail the tests. Therefore, we skip the test on the LAG port for QoS tests. Currently, to make the LAG port also support QoS test, we block the data plane queue instead of disabling the port. This change will work for all topo on Nvidia devices.

Change-Id: I6580398b6038e6a850915c57dc6112cdb628ed99
@JibinBao
Copy link
Contributor Author

@JibinBao Can you please file a new PR for 202205 branch? I saw the test cases are still skipped in 202205 branch. Thanks!
PR #10121 for 202205 is ready. Can you help review it?

yxieca pushed a commit that referenced this pull request Sep 27, 2023
Description of PR
Previously, on Nvidia devices, to make port congestion, we disable the port. So, if a LAG port is used as the tx port in the test, the LAG port will go down in 90s because the lacp pdus are also blocked, which will fail the tests. Therefore, we skip the test on the LAG port for QoS tests. Currently, to make the LAG port also support QoS test, we block the data plane queue instead of disabling the port. This change will work for all topo on Nvidia devices.

Change-Id: I6580398b6038e6a850915c57dc6112cdb628ed99

# LAG ports in T1 TOPO need to be removed in Mellanox devices
if topo in self.SUPPORTED_T0_TOPOS or isMellanoxDevice(src_dut):
if topo in self.SUPPORTED_T0_TOPOS or (topo in self.SUPPORTED_PTF_TOPOS and isMellanoxDevice(src_dut)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past, t0-120 topo was correctly processed because it is mellanox device.
Now, since t0-120 is not in ptf_topo, need to add t0-120 topo to SUPPORTED_T0_TOPOS.
please help to review PR #10200

AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Jan 25, 2024
Description of PR
Previously, on Nvidia devices, to make port congestion, we disable the port. So, if a LAG port is used as the tx port in the test, the LAG port will go down in 90s because the lacp pdus are also blocked, which will fail the tests. Therefore, we skip the test on the LAG port for QoS tests. Currently, to make the LAG port also support QoS test, we block the data plane queue instead of disabling the port. This change will work for all topo on Nvidia devices.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants