Skip to content

Porting 15718 to 202405: Adding a fixture to set scheduler to slower speeds and revert it back.#16199

Merged
yejianquan merged 5 commits intosonic-net:202405from
rraghav-cisco:15718_to_202405
Jan 13, 2025
Merged

Porting 15718 to 202405: Adding a fixture to set scheduler to slower speeds and revert it back.#16199
yejianquan merged 5 commits intosonic-net:202405from
rraghav-cisco:15718_to_202405

Conversation

@rraghav-cisco
Copy link
Contributor

Description of PR

Summary:
Fixes the flakiness of DWRR testcase. The PR adds a new fixture that slows down the scheduler without changing the underlying algorithm. This allows the dWRR test to pass consitently.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yejianquan
Copy link
Collaborator

yejianquan commented Dec 24, 2024

Hi @rraghav-cisco , could you please paste the test result with the code that resolved the conlficts?
Thanks!

@rraghav-cisco
Copy link
Contributor Author

Hi @rraghav-cisco , could you please paste the test result with the code that resolved the conlficts? Thanks!

Hi @yejianquan , I am seeing pass in Dwrr, but fails in DwrrWeightChange:

___________________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_asic] ___________________________________________________________________________________________
______________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_dut_multi_asic] ______________________________________________________________________________________
_________________________________________________________________________________ TestQosSai.testQosSaiDwrr[multi_dut_longlink_to_shortlink] _________________________________________________________________________________
_________________________________________________________________________________ TestQosSai.testQosSaiDwrr[multi_dut_shortlink_to_longlink] _________________________________________________________________________________
-------------------------------------------------------- generated xml file: /run_logs/19351-dwrr-only/2024-12-20-16-14-07/2/qos/test_qos_sai_2024-12-20-17-03-53.xml --------------------------------------------------------
INFO:root:Can not get Allure report URL. Please check logs
--------------------------------------------------------------------------------------------------- live log sessionfinish ---------------------------------------------------------------------------------------------------
17:53:46 __init__.pytest_terminal_summary         L0067 INFO   | Can not get Allure report URL. Please check logs
================================================================================================== short test summary info ===================================================================================================
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_asic]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[multi_dut_longlink_to_shortlink]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[multi_dut_shortlink_to_longlink]
SKIPPED [1] qos/test_qos_sai.py:1541: Don't have 2 shortlink frontend nodes - so can't run multi_dut_shortlink_to_shortlinktests
SKIPPED [1] qos/test_qos_sai.py:2091: Don't have 2 shortlink frontend nodes - so can't run multi_dut_shortlink_to_shortlinktests
FAILED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_asic] - tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
FAILED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic] - tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
FAILED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[multi_dut_longlink_to_shortlink] - tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
FAILED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[multi_dut_shortlink_to_longlink] - tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
======================================================================= 4 failed, 4 passed, 2 skipped, 235 deselected, 1 warning in 2991.24s (0:49:51) =======================================================================
sonic@202405-qos-sonic-mgmt-prod:/data/tests$ 

@yejianquan
Copy link
Collaborator

Hi @rraghav-cisco , could you please paste the test result with the code that resolved the conlficts? Thanks!

Hi @yejianquan , I am seeing pass in Dwrr, but fails in DwrrWeightChange:

___________________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_asic] ___________________________________________________________________________________________
______________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_dut_multi_asic] ______________________________________________________________________________________
_________________________________________________________________________________ TestQosSai.testQosSaiDwrr[multi_dut_longlink_to_shortlink] _________________________________________________________________________________
_________________________________________________________________________________ TestQosSai.testQosSaiDwrr[multi_dut_shortlink_to_longlink] _________________________________________________________________________________
-------------------------------------------------------- generated xml file: /run_logs/19351-dwrr-only/2024-12-20-16-14-07/2/qos/test_qos_sai_2024-12-20-17-03-53.xml --------------------------------------------------------
INFO:root:Can not get Allure report URL. Please check logs
--------------------------------------------------------------------------------------------------- live log sessionfinish ---------------------------------------------------------------------------------------------------
17:53:46 __init__.pytest_terminal_summary         L0067 INFO   | Can not get Allure report URL. Please check logs
================================================================================================== short test summary info ===================================================================================================
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_asic]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[multi_dut_longlink_to_shortlink]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[multi_dut_shortlink_to_longlink]
SKIPPED [1] qos/test_qos_sai.py:1541: Don't have 2 shortlink frontend nodes - so can't run multi_dut_shortlink_to_shortlinktests
SKIPPED [1] qos/test_qos_sai.py:2091: Don't have 2 shortlink frontend nodes - so can't run multi_dut_shortlink_to_shortlinktests
FAILED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_asic] - tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
FAILED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic] - tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
FAILED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[multi_dut_longlink_to_shortlink] - tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
FAILED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[multi_dut_shortlink_to_longlink] - tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
======================================================================= 4 failed, 4 passed, 2 skipped, 235 deselected, 1 warning in 2991.24s (0:49:51) =======================================================================
sonic@202405-qos-sonic-mgmt-prod:/data/tests$ 

@rraghav-cisco is DwrrWeightChange because of the testbed issue or the script?

@sdszhang sdszhang self-requested a review January 2, 2025 04:29
Copy link
Contributor

@sdszhang sdszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sdszhang
Copy link
Contributor

sdszhang commented Jan 3, 2025

Hi @rraghav-cisco , this new test failed on T0/T1 setup #16314. could you please fix it ASAP?

@yejianquan yejianquan marked this pull request as draft January 3, 2025 01:01
@yejianquan
Copy link
Collaborator

convert to draft because it fails on t0/t1 cisco devices #16314

Copy link
Contributor

@sdszhang sdszhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls fix the error on t0/t1 as well.

@sdszhang
Copy link
Contributor

sdszhang commented Jan 3, 2025

@rraghav-cisco can you include fix #16315 in this PR?

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rraghav-cisco
Copy link
Contributor Author

@rraghav-cisco can you include fix #16315 in this PR?

@sdszhang , done.

@rraghav-cisco
Copy link
Contributor Author

Hi @rraghav-cisco , could you please paste the test result with the code that resolved the conlficts? Thanks!

Hi @yejianquan , I am seeing pass in Dwrr, but fails in DwrrWeightChange:

___________________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_asic] ___________________________________________________________________________________________
______________________________________________________________________________________ TestQosSai.testQosSaiDwrr[single_dut_multi_asic] ______________________________________________________________________________________
_________________________________________________________________________________ TestQosSai.testQosSaiDwrr[multi_dut_longlink_to_shortlink] _________________________________________________________________________________
_________________________________________________________________________________ TestQosSai.testQosSaiDwrr[multi_dut_shortlink_to_longlink] _________________________________________________________________________________
-------------------------------------------------------- generated xml file: /run_logs/19351-dwrr-only/2024-12-20-16-14-07/2/qos/test_qos_sai_2024-12-20-17-03-53.xml --------------------------------------------------------
INFO:root:Can not get Allure report URL. Please check logs
--------------------------------------------------------------------------------------------------- live log sessionfinish ---------------------------------------------------------------------------------------------------
17:53:46 __init__.pytest_terminal_summary         L0067 INFO   | Can not get Allure report URL. Please check logs
================================================================================================== short test summary info ===================================================================================================
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_asic]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[single_dut_multi_asic]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[multi_dut_longlink_to_shortlink]
PASSED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrr[multi_dut_shortlink_to_longlink]
SKIPPED [1] qos/test_qos_sai.py:1541: Don't have 2 shortlink frontend nodes - so can't run multi_dut_shortlink_to_shortlinktests
SKIPPED [1] qos/test_qos_sai.py:2091: Don't have 2 shortlink frontend nodes - so can't run multi_dut_shortlink_to_shortlinktests
FAILED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_asic] - tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
FAILED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[single_dut_multi_asic] - tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
FAILED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[multi_dut_longlink_to_shortlink] - tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
FAILED qos/test_qos_sai.py::TestQosSai::testQosSaiDwrrWeightChange[multi_dut_shortlink_to_longlink] - tests.common.errors.RunAnsibleModuleFail: run module shell failed, Ansible Results =>
======================================================================= 4 failed, 4 passed, 2 skipped, 235 deselected, 1 warning in 2991.24s (0:49:51) =======================================================================
sonic@202405-qos-sonic-mgmt-prod:/data/tests$ 

@rraghav-cisco is DwrrWeightChange because of the testbed issue or the script?

@yejianquan : The issue was in my workspace. After fixing the workspace, all of these passed.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rraghav-cisco rraghav-cisco marked this pull request as ready for review January 9, 2025 04:23
@sdszhang
Copy link
Contributor

sdszhang commented Jan 9, 2025

@kevinskwang @XuChen-MSFT for viz. This is the manual cherry-pick PR for #15718 and #16315 combined into 202405.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@cyw233 cyw233 mentioned this pull request Jan 10, 2025
12 tasks
yejianquan pushed a commit that referenced this pull request Jan 13, 2025
Description of PR
Optimize the qos/test_qos_sai.py test to reduce the running time.

Summary:
Fixes # (issue) Microsoft ADO 30056122

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Add ownership here(Microsft required only)
 Test case improvement

Approach
What is the motivation for this PR?
The running time of the qos/test_qos_sai.py test is too long (~9h) on T2 chassis so we wanted to reduce the running time. With this implementation, the running time will be reduced to (~7.5h)

How did you do it?
How did you verify/test it?
I ran the updated code and can confirm it's working well on T2 chassis: Elastictest link. The 2 DWRR failures are expected, which will be fixed after #16199

I also ran a T1 regression test to confirm: Elastictest link

co-authorized by: jianquanye@microsoft.com
cyw233 added a commit to cyw233/sonic-mgmt that referenced this pull request Jan 13, 2025
Description of PR
Optimize the qos/test_qos_sai.py test to reduce the running time.

Summary:
Fixes # (issue) Microsoft ADO 30056122

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Add ownership here(Microsft required only)
 Test case improvement

Approach
What is the motivation for this PR?
The running time of the qos/test_qos_sai.py test is too long (~9h) on T2 chassis so we wanted to reduce the running time. With this implementation, the running time will be reduced to (~7.5h)

How did you do it?
How did you verify/test it?
I ran the updated code and can confirm it's working well on T2 chassis: Elastictest link. The 2 DWRR failures are expected, which will be fixed after sonic-net#16199

I also ran a T1 regression test to confirm: Elastictest link

co-authorized by: jianquanye@microsoft.com
cyw233 added a commit to cyw233/sonic-mgmt that referenced this pull request Jan 13, 2025
Description of PR
Optimize the qos/test_qos_sai.py test to reduce the running time.

Summary:
Fixes # (issue) Microsoft ADO 30056122

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Add ownership here(Microsft required only)
 Test case improvement

Approach
What is the motivation for this PR?
The running time of the qos/test_qos_sai.py test is too long (~9h) on T2 chassis so we wanted to reduce the running time. With this implementation, the running time will be reduced to (~7.5h)

How did you do it?
How did you verify/test it?
I ran the updated code and can confirm it's working well on T2 chassis: Elastictest link. The 2 DWRR failures are expected, which will be fixed after sonic-net#16199

I also ran a T1 regression test to confirm: Elastictest link

co-authorized by: jianquanye@microsoft.com
@sdszhang
Copy link
Contributor

Test result
T0/T1 PASSED: T0/T1 result
T2 PASSED: T2 result

@yejianquan yejianquan merged commit 942b849 into sonic-net:202405 Jan 13, 2025
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Jan 13, 2025
Description of PR
Optimize the qos/test_qos_sai.py test to reduce the running time.

Summary:
Fixes # (issue) Microsoft ADO 30056122

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Add ownership here(Microsft required only)
 Test case improvement

Approach
What is the motivation for this PR?
The running time of the qos/test_qos_sai.py test is too long (~9h) on T2 chassis so we wanted to reduce the running time. With this implementation, the running time will be reduced to (~7.5h)

How did you do it?
How did you verify/test it?
I ran the updated code and can confirm it's working well on T2 chassis: Elastictest link. The 2 DWRR failures are expected, which will be fixed after sonic-net#16199

I also ran a T1 regression test to confirm: Elastictest link

co-authorized by: jianquanye@microsoft.com
mssonicbld pushed a commit that referenced this pull request Jan 13, 2025
Description of PR
Optimize the qos/test_qos_sai.py test to reduce the running time.

Summary:
Fixes # (issue) Microsoft ADO 30056122

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Add ownership here(Microsft required only)
 Test case improvement

Approach
What is the motivation for this PR?
The running time of the qos/test_qos_sai.py test is too long (~9h) on T2 chassis so we wanted to reduce the running time. With this implementation, the running time will be reduced to (~7.5h)

How did you do it?
How did you verify/test it?
I ran the updated code and can confirm it's working well on T2 chassis: Elastictest link. The 2 DWRR failures are expected, which will be fixed after #16199

I also ran a T1 regression test to confirm: Elastictest link

co-authorized by: jianquanye@microsoft.com
yejianquan pushed a commit that referenced this pull request Jan 14, 2025
Description of PR
Optimize the qos/test_qos_sai.py test to reduce the running time.

Summary:
Fixes # (issue) Microsoft ADO 30056122

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Add ownership here(Microsft required only)
 Test case improvement

Approach
What is the motivation for this PR?
The running time of the qos/test_qos_sai.py test is too long (~9h) on T2 chassis so we wanted to reduce the running time. With this implementation, the running time will be reduced to (~7.5h)

How did you do it?
How did you verify/test it?
I ran the updated code and can confirm it's working well on T2 chassis: Elastictest link. The 2 DWRR failures are expected, which will be fixed after #16199

I also ran a T1 regression test to confirm: Elastictest link

co-authorized by: jianquanye@microsoft.com
nnelluri-cisco pushed a commit to nnelluri-cisco/sonic-mgmt that referenced this pull request Mar 15, 2025
Description of PR
Optimize the qos/test_qos_sai.py test to reduce the running time.

Summary:
Fixes # (issue) Microsoft ADO 30056122

Type of change
 Bug fix
 Testbed and Framework(new/improvement)
 New Test case
 Skipped for non-supported platforms
 Add ownership here(Microsft required only)
 Test case improvement

Approach
What is the motivation for this PR?
The running time of the qos/test_qos_sai.py test is too long (~9h) on T2 chassis so we wanted to reduce the running time. With this implementation, the running time will be reduced to (~7.5h)

How did you do it?
How did you verify/test it?
I ran the updated code and can confirm it's working well on T2 chassis: Elastictest link. The 2 DWRR failures are expected, which will be fixed after sonic-net#16199

I also ran a T1 regression test to confirm: Elastictest link

co-authorized by: jianquanye@microsoft.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants