Skip to content

Leverage the fanout's ASIC to send PFC storm#13242

Merged
bingwang-ms merged 1 commit intosonic-net:masterfrom
nhe-NV:sneding_pfc_with_asic
Jul 30, 2024
Merged

Leverage the fanout's ASIC to send PFC storm#13242
bingwang-ms merged 1 commit intosonic-net:masterfrom
nhe-NV:sneding_pfc_with_asic

Conversation

@nhe-NV
Copy link
Copy Markdown
Contributor

@nhe-NV nhe-NV commented Jun 12, 2024

Currently, the PFC storm is sent using packet socket on the host CPU, which causes some drawbacks

  • There is strict timing requirement, which can not be guaranteed
  • The more ports under PFC storm, the higher probability that the timing requirement is not satisfied

We will leverage the fanout's ASIC to send PFC storm

  • Use SDK API to enable lossless traffic, set xoff to 0, and setup mappings for the PGs and the corresponding IEEE priorities

After the change, the pfc related test case which is using onxy as fanout will send the pfc frame with ASIC instead of using the CPU.
New docker storm docker need to be build with the Docker file provided in this PR.

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 201911
  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

To make the pfc related test case much more stable.

How did you do it?

Change sending pfc frame with CPU to ASIC on the onxy fanout.

How did you verify/test it?

Run the pfc related test case on the setup which has onxy fanout, test case could pass stablly.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@nhe-NV nhe-NV force-pushed the sneding_pfc_with_asic branch from 899de5a to 7537e39 Compare June 12, 2024 02:53
@mssonicbld
Copy link
Copy Markdown
Collaborator

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:13:1: F403 'from python_sdk_api.sx_api import *' used; unable to detect undefined names
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:18:1: E302 expected 2 blank lines, found 1
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:21:1: E302 expected 2 blank lines, found 1
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:32:1: E302 expected 2 blank lines, found 1
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:33:10: F405 'sx_api_port_pfc_enable_set' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:34:14: F405 'SX_STATUS_SUCCESS' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:38:17: F405 'new_sx_cos_priority_t_arr' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:39:15: F405 'new_sx_cos_ieee_prio_t_arr' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:40:18: F405 'new_sx_cos_port_prio_buff_t_p' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:42:1: E302 expected 2 blank lines, found 1
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:43:5: F405 'sx_cos_priority_t_arr_setitem' may be undefined, or defined from star imports: python_sdk_api.sx_api
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

@wsycqyz
Copy link
Copy Markdown
Contributor

wsycqyz commented Jun 13, 2024

The pre-commit check is mandatory. Can you help fix it?

@wsycqyz
Copy link
Copy Markdown
Contributor

wsycqyz commented Jun 13, 2024

@bingwang-ms This seems to be a big PR. FYI.

@nhe-NV
Copy link
Copy Markdown
Contributor Author

nhe-NV commented Jun 15, 2024

/azpw run

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@bingwang-ms
Copy link
Copy Markdown
Collaborator

@kperumalbfn Can you please help review this PR?

@bingwang-ms
Copy link
Copy Markdown
Collaborator

bingwang-ms commented Jul 22, 2024

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS.
BTW, I didn't see code for building and importing the new container. Is that done manually?

@bingwang-ms
Copy link
Copy Markdown
Collaborator

The new PFCWD logic requires queue not empty. sonic-net/sonic-swss#3036
How is that condition satisfied?

@nhe-NV
Copy link
Copy Markdown
Contributor Author

nhe-NV commented Jul 23, 2024

The new PFCWD logic requires queue not empty. sonic-net/sonic-swss#3036 How is that condition satisfied?

Hi @bingwang-ms the logic is implemented in #12733.
If run the test case with the PR #12733 only, and not include this PR, you will have the "queue not empty" requirement met, but due to the pfc frame is sent by the CPU, the traffic is not stable, and espicially for test_pfcwd_timer_accuracy, the test could pass with very low possiblity.

@bingwang-ms
Copy link
Copy Markdown
Collaborator

The new PFCWD logic requires queue not empty. sonic-net/sonic-swss#3036 How is that condition satisfied?

Hi @bingwang-ms the logic is implemented in #12733. If run the test case with the PR #12733 only, and not include this PR, you will have the "queue not empty" requirement met, but due to the pfc frame is sent by the CPU, the traffic is not stable, and espicially for test_pfcwd_timer_accuracy, the test could pass with very low possiblity.

I see. Thanks for the clarification. Then it makes sense to me. I missed #12733 as there is no cherry-pick request. I just added the cherry-pick label for it.

@bingwang-ms
Copy link
Copy Markdown
Collaborator

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

@bingwang-ms
Copy link
Copy Markdown
Collaborator

Related PR for Non-Onyx leaf-fanout #13768

@nhe-NV
Copy link
Copy Markdown
Contributor Author

nhe-NV commented Jul 23, 2024

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

I see you found the answer for nononxy one: "Related PR for Non-Onyx leaf-fanout #13768"

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?
I see you have found the answer that "Related PR for Non-Onyx leaf-fanout #13768"

What do you mean this" I didn't see code for building and importing the new container. Is that done manually?"
In the Code, we do have the code to build the container.

@nhe-NV
Copy link
Copy Markdown
Contributor Author

nhe-NV commented Jul 23, 2024

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

I see you found the answer for nononxy one: "Related PR for Non-Onyx leaf-fanout #13768"

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?
I see you have found the answer that "Related PR for Non-Onyx leaf-fanout #13768"

What do you mean this" I didn't see code for building and importing the new container. Is that done manually?" In the Code, we do have the code to build the container.

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

I see you found the answer for nononxy one: "Related PR for Non-Onyx leaf-fanout #13768"

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?
I see you have found the answer that "Related PR for Non-Onyx leaf-fanout #13768"

What do you mean this" I didn't see code for building and importing the new container. Is that done manually?" In the Code, we do have the code to build the container. The container has the same name as prev one. so nonthing changed when using it.

@bingwang-ms
Copy link
Copy Markdown
Collaborator

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

I see you found the answer for nononxy one: "Related PR for Non-Onyx leaf-fanout #13768"

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?
I see you have found the answer that "Related PR for Non-Onyx leaf-fanout #13768"

What do you mean this" I didn't see code for building and importing the new container. Is that done manually?" In the Code, we do have the code to build the container.

I see. So the code for building the docker is not changed right? Can you please point me the code for building the docker?

@nhe-NV
Copy link
Copy Markdown
Contributor Author

nhe-NV commented Jul 24, 2024

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

I see you found the answer for nononxy one: "Related PR for Non-Onyx leaf-fanout #13768"

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?
I see you have found the answer that "Related PR for Non-Onyx leaf-fanout #13768"

What do you mean this" I didn't see code for building and importing the new container. Is that done manually?" In the Code, we do have the code to build the container.

I see. So the code for building the docker is not changed right? Can you please point me the code for building the docker?

Hi @bingwang-ms In this PR, you can find ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/Makefile, this is a new file for build the new docker. but the docker name that build is same as prev

@bingwang-ms
Copy link
Copy Markdown
Collaborator

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

I see you found the answer for nononxy one: "Related PR for Non-Onyx leaf-fanout #13768"

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?
I see you have found the answer that "Related PR for Non-Onyx leaf-fanout #13768"

What do you mean this" I didn't see code for building and importing the new container. Is that done manually?" In the Code, we do have the code to build the container.

I see. So the code for building the docker is not changed right? Can you please point me the code for building the docker?

Hi @bingwang-ms In this PR, you can find ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/Makefile, this is a new file for build the new docker. but the docker name that build is same as prev

Yes, I saw the makefile. But can you point me the code for building the new docker image? I didn't see that code in this PR.

@nhe-NV
Copy link
Copy Markdown
Contributor Author

nhe-NV commented Jul 24, 2024

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?

I see you found the answer for nononxy one: "Related PR for Non-Onyx leaf-fanout #13768"

@nhe-NV Does this change require the leaf-fanout switch running Onyx? If that's the case, the PR may not help us because we are not using Onyx as leaf-fanout OS. BTW, I didn't see code for building and importing the new container. Is that done manually?

Can you please check this question as well?
I see you have found the answer that "Related PR for Non-Onyx leaf-fanout #13768"

What do you mean this" I didn't see code for building and importing the new container. Is that done manually?" In the Code, we do have the code to build the container.

I see. So the code for building the docker is not changed right? Can you please point me the code for building the docker?

Hi @bingwang-ms In this PR, you can find ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/Makefile, this is a new file for build the new docker. but the docker name that build is same as prev

Yes, I saw the makefile. But can you point me the code for building the new docker image? I didn't see that code in this PR.

Is this that you are looking for?
image

@bingwang-ms
Copy link
Copy Markdown
Collaborator

Hi @nhe-NV , I understand we have Makefile and Dockerfile. I'm looking for the code to call the make command or the docker build command as I didn't see that code in this change. Can you point it to me?

@nhe-NV
Copy link
Copy Markdown
Contributor Author

nhe-NV commented Jul 29, 2024

Hi @nhe-NV , I understand we have Makefile and Dockerfile. I'm looking for the code to call the make command or the docker build command as I didn't see that code in this change. Can you point it to me?

Hi @bingwang-ms , same as prev pfc_wd docker, it dose not have the build command in the script, user will build their own docker and save it locally on demand. In the relevent test case, it will use it directly

@bingwang-ms
Copy link
Copy Markdown
Collaborator

I see. Thanks for the clarification. Please address the conflict.

@nhe-NV nhe-NV force-pushed the sneding_pfc_with_asic branch from 30c303c to 5debb87 Compare July 30, 2024 00:14
@mssonicbld
Copy link
Copy Markdown
Collaborator

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:119:18: F405 'new_uint32_t_p' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:120:5: F405 'uint32_t_p_assign' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:121:28: F405 'new_sx_port_attributes_t_arr' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:122:10: F405 'sx_api_port_device_get' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:123:14: F405 'SX_STATUS_SUCCESS' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:126:16: F405 'uint32_t_p_value' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:129:28: F405 'new_sx_port_attributes_t_arr' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:130:10: F405 'sx_api_port_device_get' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:131:15: F405 'SX_STATUS_SUCCESS' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:141:27: F405 'sx_port_attributes_t_arr_getitem' may be undefined, or defined from star imports: python_sdk_api.sx_api
ansible/roles/test/files/mlnx/docker-tests-pfcgen-asic/pfc_gen.py:156:27: F405 'sx_port_attributes_t_arr_getitem' may be undefined, or defined from star imports: python_sdk_api.sx_api
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

@nhe-NV nhe-NV force-pushed the sneding_pfc_with_asic branch 3 times, most recently from f4fa608 to 05bb1e8 Compare July 30, 2024 00:24
Currently, the PFC storm is sent using packet socket on the host CPU, which causes some drawbacks

- There is strict timing requirement, which can not be guaranteed
- The more ports under PFC storm, the higher probability that the timing requirement is not satisfied

We will leverage the fanout's ASIC to send PFC storm

- Use SDK API to enable lossless traffic, set xoff to 0, and setup mappings for the PGs and the corresponding IEEE priorities

Limitation

Still keep sending pfc frame with cpu in  the storm docker, but rename it to pfc_gen_cpu.py in the docker.

If use the ASIC to send the pfc frame, ASIC could not send specified count of the pfc frame. need to still use the CPU to send the pfc frame.such as the test_pfc_counter test

Signed-off-by: Stephen Sun <[email protected]>
@nhe-NV nhe-NV force-pushed the sneding_pfc_with_asic branch from 05bb1e8 to 5547d5d Compare July 30, 2024 00:27
@bingwang-ms bingwang-ms merged commit 17a20ab into sonic-net:master Jul 30, 2024
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Jul 30, 2024
Currently, the PFC storm is sent using packet socket on the host CPU, which causes some drawbacks

- There is strict timing requirement, which can not be guaranteed
- The more ports under PFC storm, the higher probability that the timing requirement is not satisfied

We will leverage the fanout's ASIC to send PFC storm

- Use SDK API to enable lossless traffic, set xoff to 0, and setup mappings for the PGs and the corresponding IEEE priorities

Limitation

Still keep sending pfc frame with cpu in  the storm docker, but rename it to pfc_gen_cpu.py in the docker.

If use the ASIC to send the pfc frame, ASIC could not send specified count of the pfc frame. need to still use the CPU to send the pfc frame.such as the test_pfc_counter test

Signed-off-by: Stephen Sun <[email protected]>
Co-authored-by: Stephen Sun <[email protected]>
@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to 202405: #13885

mssonicbld pushed a commit that referenced this pull request Jul 30, 2024
Currently, the PFC storm is sent using packet socket on the host CPU, which causes some drawbacks

- There is strict timing requirement, which can not be guaranteed
- The more ports under PFC storm, the higher probability that the timing requirement is not satisfied

We will leverage the fanout's ASIC to send PFC storm

- Use SDK API to enable lossless traffic, set xoff to 0, and setup mappings for the PGs and the corresponding IEEE priorities

Limitation

Still keep sending pfc frame with cpu in  the storm docker, but rename it to pfc_gen_cpu.py in the docker.

If use the ASIC to send the pfc frame, ASIC could not send specified count of the pfc frame. need to still use the CPU to send the pfc frame.such as the test_pfc_counter test

Signed-off-by: Stephen Sun <[email protected]>
Co-authored-by: Stephen Sun <[email protected]>
arista-hpandya pushed a commit to arista-hpandya/sonic-mgmt that referenced this pull request Oct 2, 2024
Currently, the PFC storm is sent using packet socket on the host CPU, which causes some drawbacks

- There is strict timing requirement, which can not be guaranteed
- The more ports under PFC storm, the higher probability that the timing requirement is not satisfied

We will leverage the fanout's ASIC to send PFC storm

- Use SDK API to enable lossless traffic, set xoff to 0, and setup mappings for the PGs and the corresponding IEEE priorities

Limitation

Still keep sending pfc frame with cpu in  the storm docker, but rename it to pfc_gen_cpu.py in the docker.

If use the ASIC to send the pfc frame, ASIC could not send specified count of the pfc frame. need to still use the CPU to send the pfc frame.such as the test_pfc_counter test

Signed-off-by: Stephen Sun <[email protected]>
Co-authored-by: Stephen Sun <[email protected]>
vikshaw-Nokia pushed a commit to vikshaw-Nokia/sonic-mgmt that referenced this pull request Oct 23, 2024
Currently, the PFC storm is sent using packet socket on the host CPU, which causes some drawbacks

- There is strict timing requirement, which can not be guaranteed
- The more ports under PFC storm, the higher probability that the timing requirement is not satisfied

We will leverage the fanout's ASIC to send PFC storm

- Use SDK API to enable lossless traffic, set xoff to 0, and setup mappings for the PGs and the corresponding IEEE priorities

Limitation

Still keep sending pfc frame with cpu in  the storm docker, but rename it to pfc_gen_cpu.py in the docker.

If use the ASIC to send the pfc frame, ASIC could not send specified count of the pfc frame. need to still use the CPU to send the pfc frame.such as the test_pfc_counter test

Signed-off-by: Stephen Sun <[email protected]>
Co-authored-by: Stephen Sun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants