Skip to content

cisco-8000: set 512bit time to a longer duration for cisco-8000 pfcwd tests using pfc_gen.py#16159

Merged
yejianquan merged 6 commits intosonic-net:masterfrom
rraghav-cisco:pfc-set-timer-pfcwd
Jan 13, 2025
Merged

cisco-8000: set 512bit time to a longer duration for cisco-8000 pfcwd tests using pfc_gen.py#16159
yejianquan merged 6 commits intosonic-net:masterfrom
rraghav-cisco:pfc-set-timer-pfcwd

Conversation

@rraghav-cisco
Copy link
Contributor

Description of PR

Summary:
Fixes the flakiness of pfc_gen in pfcwd scripts for cisco-8000. We use a new debug CLI script to force the DUT to wait longer in case of a miss in pfc packets from the fanout due to pfc_gen script. So even if the pfc_gen/fanout misses a couple of pfc frames to DUT, the dut would still not send out data packets.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405
  • 202411

Approach

What is the motivation for this PR?

Flakiness of pfc-gen. Particularly with 400G links.

How did you do it?

We have added a new dshell based script that will force the DUT to wait before transmitting data in case of a miss in pfc pause frames.

How did you verify/test it?

Ran on our duts, with 100G and 400G.

Any platform specific information?

The new fix specific only to cisco-8000.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sdszhang
Copy link
Contributor

@abdosi can you help to take a look ?

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).



@pytest.fixture(scope="function", autouse=False)
def set_pfc_time_cisco_8000(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls limit this change to Cisco T2 chassis only for now.

port)


def set_pfc_timer_cisco_8000(duthost, asic_id, script, port):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls limit this change to Cisco T2 chassis only for now.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sdszhang
Copy link
Contributor

sdszhang commented Jan 8, 2025

@rraghav-cisco can you upload the test result.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yejianquan
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yejianquan
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Collaborator

@yejianquan yejianquan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yejianquan yejianquan merged commit ab3d76b into sonic-net:master Jan 13, 2025
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Jan 13, 2025
Description of PR
Summary:
Fixes the flakiness of pfc_gen in pfcwd scripts for cisco-8000. We use a new debug CLI script to force the DUT to wait longer in case of a miss in pfc packets from the fanout due to pfc_gen script. So even if the pfc_gen/fanout misses a couple of pfc frames to DUT, the dut would still not send out data packets.

Approach
What is the motivation for this PR?
Flakiness of pfc-gen. Particularly with 400G links.

How did you do it?
We have added a new dshell based script that will force the DUT to wait before transmitting data in case of a miss in pfc pause frames.

How did you verify/test it?
Ran on our duts, with 100G and 400G.

Any platform specific information?
The new fix specific only to cisco-8000.

co-authorized by: jianquanye@microsoft.com
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Jan 13, 2025
Description of PR
Summary:
Fixes the flakiness of pfc_gen in pfcwd scripts for cisco-8000. We use a new debug CLI script to force the DUT to wait longer in case of a miss in pfc packets from the fanout due to pfc_gen script. So even if the pfc_gen/fanout misses a couple of pfc frames to DUT, the dut would still not send out data packets.

Approach
What is the motivation for this PR?
Flakiness of pfc-gen. Particularly with 400G links.

How did you do it?
We have added a new dshell based script that will force the DUT to wait before transmitting data in case of a miss in pfc pause frames.

How did you verify/test it?
Ran on our duts, with 100G and 400G.

Any platform specific information?
The new fix specific only to cisco-8000.

co-authorized by: jianquanye@microsoft.com
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202411: #16468

@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #16469

mssonicbld pushed a commit that referenced this pull request Jan 13, 2025
Description of PR
Summary:
Fixes the flakiness of pfc_gen in pfcwd scripts for cisco-8000. We use a new debug CLI script to force the DUT to wait longer in case of a miss in pfc packets from the fanout due to pfc_gen script. So even if the pfc_gen/fanout misses a couple of pfc frames to DUT, the dut would still not send out data packets.

Approach
What is the motivation for this PR?
Flakiness of pfc-gen. Particularly with 400G links.

How did you do it?
We have added a new dshell based script that will force the DUT to wait before transmitting data in case of a miss in pfc pause frames.

How did you verify/test it?
Ran on our duts, with 100G and 400G.

Any platform specific information?
The new fix specific only to cisco-8000.

co-authorized by: jianquanye@microsoft.com
mssonicbld pushed a commit that referenced this pull request Jan 15, 2025
Description of PR
Summary:
Fixes the flakiness of pfc_gen in pfcwd scripts for cisco-8000. We use a new debug CLI script to force the DUT to wait longer in case of a miss in pfc packets from the fanout due to pfc_gen script. So even if the pfc_gen/fanout misses a couple of pfc frames to DUT, the dut would still not send out data packets.

Approach
What is the motivation for this PR?
Flakiness of pfc-gen. Particularly with 400G links.

How did you do it?
We have added a new dshell based script that will force the DUT to wait before transmitting data in case of a miss in pfc pause frames.

How did you verify/test it?
Ran on our duts, with 100G and 400G.

Any platform specific information?
The new fix specific only to cisco-8000.

co-authorized by: jianquanye@microsoft.com
asic_arg = ""
if asic_id:
asic_arg = f"-n asic{asic_id}"
duthost.shell(f"show platform npu script {asic_arg} -s {script_name}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rraghav-cisco i ran into this problem:

>           raise RunAnsibleModuleFail("run module {} failed".format(self.module_name), res)
E           tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
E           failed = True
E           changed = True
E           rc = 2
E           cmd = show platform npu script  -s set_pfc_time.py
E           start = 2025-01-23 04:05:01.590011
E           end = 2025-01-23 04:05:03.007243
E           delta = 0:00:01.417232
E           msg = non-zero return code
E           invocation = {'module_args': {'_raw_params': 'show platform npu script  -s set_pfc_time.py', '_uses_shell': True, 'warn': False, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'executable': None, 'creates': None, 'removes': None, 'stdin': None}}
E           _ansible_no_log = None
E           stdout =
E           stderr =
E           Usage: show platform npu script [OPTIONS]
E           Try "show platform npu script -h" for help.
E           
E           Error: Missing option "-n".  Choose from:
E               asic0,
E               asic1,
E               asic2.

cc @sdszhang

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rraghav-cisco i ran into this problem:

>           raise RunAnsibleModuleFail("run module {} failed".format(self.module_name), res)
E           tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
E           failed = True
E           changed = True
E           rc = 2
E           cmd = show platform npu script  -s set_pfc_time.py
E           start = 2025-01-23 04:05:01.590011
E           end = 2025-01-23 04:05:03.007243
E           delta = 0:00:01.417232
E           msg = non-zero return code
E           invocation = {'module_args': {'_raw_params': 'show platform npu script  -s set_pfc_time.py', '_uses_shell': True, 'warn': False, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'executable': None, 'creates': None, 'removes': None, 'stdin': None}}
E           _ansible_no_log = None
E           stdout =
E           stderr =
E           Usage: show platform npu script [OPTIONS]
E           Try "show platform npu script -h" for help.
E           
E           Error: Missing option "-n".  Choose from:
E               asic0,
E               asic1,
E               asic2.

cc @sdszhang

@auspham , @sdszhang : I have raised #17858 for this issue.

@rraghav-cisco rraghav-cisco changed the title Add pfc-timer-set to 500mS for pfcwd tests. cisco-8000: set 512bit time to a longer duration for cisco-8000 pfcwd tests using pfc_gen.py Jan 29, 2025
nnelluri-cisco pushed a commit to nnelluri-cisco/sonic-mgmt that referenced this pull request Mar 15, 2025
Description of PR
Summary:
Fixes the flakiness of pfc_gen in pfcwd scripts for cisco-8000. We use a new debug CLI script to force the DUT to wait longer in case of a miss in pfc packets from the fanout due to pfc_gen script. So even if the pfc_gen/fanout misses a couple of pfc frames to DUT, the dut would still not send out data packets.

Approach
What is the motivation for this PR?
Flakiness of pfc-gen. Particularly with 400G links.

How did you do it?
We have added a new dshell based script that will force the DUT to wait before transmitting data in case of a miss in pfc pause frames.

How did you verify/test it?
Ran on our duts, with 100G and 400G.

Any platform specific information?
The new fix specific only to cisco-8000.

co-authorized by: jianquanye@microsoft.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants