Extend LACP time multiplier for advanced-reboot tests with cEOS peers#17964
Extend LACP time multiplier for advanced-reboot tests with cEOS peers#17964StormLiangMS merged 4 commits intosonic-net:masterfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run Azure.sonic-mgmt |
|
/AzurePipelines run Azure.sonic-mgmt |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
thanks @justin-wong-ce , I picked this diff into our internal branch and will have a try in these days, meanwhile, could you please share your test result, if the case will pass 100% with this fix? thanks a lot |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
3859bf6 to
2e7b085
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
I have tested with I will be testing the sad cases from |
|
/azpw run Azure.sonic-mgmt |
|
/AzurePipelines run Azure.sonic-mgmt |
|
Azure Pipelines successfully started running 1 pipeline(s). |
lipxu
left a comment
There was a problem hiding this comment.
I picked the diff and tried it on our internal pipeline for case platform_tests/test_advanced_reboot.py, one passed, but the other failed (second retry seems timeout, will try again today),
https://elastictest.org/scheduler/testplan/67fdfe9e40a6f1f300f540f0
https://elastictest.org/scheduler/testplan/67fdfe7b1a569391805d5be0
This change also passes all variants of |
Thanks for checking. |
Thanks @justin-wong-ce I sent the logs to you, please help to check. thank you very much. |
lipxu
left a comment
There was a problem hiding this comment.
@saiarcot895 Could you please help to review the PR, thanks a lot.
Did the second try pass? I will also be running the tests again on a Arista-7060CX-32S-D48C8 |
The tests pass on my end: Let me know if you would like me to send you the logs |
Thanks, @justin-wong-ce , It's OK with me, let's waiting for @saiarcot895's review, thanks a lot |
…sonic-net#17964) What is the motivation for this PR? Transitioning from using vSonic peers to cEOS peers for some tests. Requested by Microsoft. How did you do it? Adding cli commands for each cEOS peers at the start and end of the PTF test advanced-reboot.py. How did you verify/test it? Tested with test_upgrade_path.py::test_upgrade_path on a Arista-7050CX3-32S-C32.
|
Cherry-pick PR to 202411: #18113 |
|
@justin-wong-ce, I think this change will extend LACP time multiplier for all advanced-reboot (or upgrade path) cases. Although that will not fail the test, but it is also excessive - we only need to extend the LACP retry count for the platforms where we do NOT trust the boot up process to be fast enough. Additionally the test should verify the behavior of the TOR devices in production. So if (for eg., 7260) upgrades w/o retry count extended it should be tested against that criteria during qualification stage. |
Hmm the |
|
@vaibhavhd seems like there is no easy to determine if the test is doing I think we have to introduce a new test param so the PTF test knows if it is doing a |
…#17964) What is the motivation for this PR? Transitioning from using vSonic peers to cEOS peers for some tests. Requested by Microsoft. How did you do it? Adding cli commands for each cEOS peers at the start and end of the PTF test advanced-reboot.py. How did you verify/test it? Tested with test_upgrade_path.py::test_upgrade_path on a Arista-7050CX3-32S-C32.
|
@justin-wong-ce the issue here is that previously we used cEOS neighbors with 7260 (lacp multiplier 3) and sonic neighbors (lacp multiplier 5) for 7060 and Mellanox 2700. This gave us the tight lacp boundaries we needed for each platform. Now with this change, ceos can be used for 7060 and 2700 with the same lacp multiplier 5 as sonic had, but the side-effect is that the timing boundary for 7260 has increased from 3 to 5 which we don't want. This is not really a decision of the type of test ( Previously, we had to make a decision to use sonic or ceos neighbors based on platform + OS and we did that by passing in the That way the test caller when they pick the testbed/dut and to/from versions they can also set the neighbor type to ceos and set the lacp multipiler too. |
I see, sounds good. |
|
@justin-wong-ce instead of hardcoding the 7060 and 2700 hwskus in the test scripts, can we have a CLI passed in from the top-level? (i.e. when triggering the sonic-mgmt test) |
As in a new param defining lacp multiplier passed in from Also, should |
Add an argument for the advanced_reboot PTF test to set the LACP multiplier for ceos neighbors. Requested here: sonic-net#17964 (comment)
|
PR created @Ryangwaite @saiarcot895 : |
Code sync sonic-net/sonic-mgmt:202411 => 202412 ``` * | c6a94a0 (pub_upstream/202411) Revert "[dualtor_io] Allow duplications for link down downstream I/O (sonic-net#17909)" (sonic-net#18192) * | de454d5 [testARPCompleted] Cleanup ptf ip after test failure (sonic-net#18170) * | 4a3d1d9 [dualtor] Refine `fdb_mac_learning_test.py` (sonic-net#18092) * | 5964a78 [dualtor_io] Fix the start marker not found issue (sonic-net#18096) * | ce40816 Extend LACP time multiplier for advanced-reboot tests with cEOS peers (sonic-net#17964) * | 0e70ba3 adjust port selection in case testQosSaiXonHysteresis for Cisco-8101 (sonic-net#18130) * | 8bb7203 [202411] Restore disable packet aging fixture 202411 (sonic-net#18103) * | 8f6d1a3 Filter out Not Applicable values in command line (sonic-net#18006) * | 9d5de5c Backport t0-118 test configs to 202411 (sonic-net#17983) ```
Code sync sonic-net/sonic-mgmt:202411 => 202503 ``` * 6b59eaa (HEAD -> sync/202503, origin/sync/202503) Merge remote-tracking branch 'pub_upstream/202411' into sync/202503 |\ | * c6a94a0 (pub_upstream/202411) Revert "[dualtor_io] Allow duplications for link down downstream I/O (sonic-net#17909)" (sonic-net#18192) | * de454d5 [testARPCompleted] Cleanup ptf ip after test failure (sonic-net#18170) | * 4a3d1d9 [dualtor] Refine `fdb_mac_learning_test.py` (sonic-net#18092) | * 5964a78 [dualtor_io] Fix the start marker not found issue (sonic-net#18096) | * ce40816 Extend LACP time multiplier for advanced-reboot tests with cEOS peers (sonic-net#17964) | * 0e70ba3 adjust port selection in case testQosSaiXonHysteresis for Cisco-8101 (sonic-net#18130) | * 8bb7203 [202411] Restore disable packet aging fixture 202411 (sonic-net#18103) | * 8f6d1a3 Filter out Not Applicable values in command line (sonic-net#18006) | * 9d5de5c Backport t0-118 test configs to 202411 (sonic-net#17983) | * e758401 mark xfail on generic hash test for isolated topo (sonic-net#18071) | * c65ceab [202411][dualtor] Skip pfcwd warm reboot on dualtor (sonic-net#18072) | * c509006 Improve disabling packet aging to support swap_syncd (sonic-net#17728) (sonic-net#17739) | * 9dc2244 [202411][dualtor-aa] Fix test_arp_dualtor on active-active dualtor (sonic-net#18073) | * cf12a33 fixed tacacs duplicate user issue (sonic-net#18068) | * 330a893 Fix telemetry/test_events.py for dualtor (sonic-net#18025) | * dc6fee8 Remove admin down ports in BUFFER PG check logic (sonic-net#17505) | * 805d538 Update generic hash test to support dualtor active active topology (sonic-net#16217) | * 7c31e46 [dualtor_io] Allow duplications for link down downstream I/O (sonic-net#17909) | * a7f50c6 Fix vlan vs router mac issue with test_qos_dscp_mapping.py (sonic-net#17846) (sonic-net#18003) | * 9ab1e7a Skip test_incremental_qos on Mellanox dualtor (sonic-net#17406) (sonic-net#18048) | * f42afd0 Force eos default creds to be string (sonic-net#18026) | * be542b0 Restore config after vxlan_crm from vxlan_ecmp. (sonic-net#17767) | * f0718b9 [Fix for Issue sonic-net#17413] Modified the Tx Rx port id list selection for all to all scenario (sonic-net#17919) | * 3eb4ed4 [dualtor_io] Collect syslog to debug (sonic-net#17722) | * d5bd995 Disable PFC-WD during PCBB and some wmk test improvements (sonic-net#17889) | * 2f512aa Update outer UDP sport range to exclude port 53 (sonic-net#17570) (sonic-net#17798) | * 980b373 skip test_bgp_slb advanced reboot for isolated topo (sonic-net#17470) | * 408bf9e Default the inner dscp to outer dscp map to be 1-1. (sonic-net#17860) | * 37495a1 Add dualtor fixtures to no_traffic test. (sonic-net#17916) | * a13b599 Only print the matched syslog in loganalzyer teardown check, no traceback info printed (sonic-net#17926) | * 6127f29 Revert "Skip test_vnet_decap on Cisco-8000 with 202411 (sonic-net#17776)" (sonic-net#17941) (sonic-net#17942) | * 60274db Increase timeout to 5 in verify_packet_any_port for background traffic (sonic-net#17904) ```
PR sonic-net#17964 merged. and case passed https://elastictest.org/scheduler/testplan/682d4c15dbe6294207827572?leftSideViewMode=detail&testcase=platform_tests%2Ftest_advanced_reboot.py&type=console ---- #### AI description (iteration 1) #### PR Classification Configuration update #### PR Summary This pull request updates the configuration to exclude the warm-reboot test for the 7060 platform. - Changes in `/.azure-pipelines/elastictest/arista/7060cx.t0.202411.yml`: Excluded `platform_tests/test_advanced_reboot.py` from the test scripts. <!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot --> Reverts !11104
…sonic-net#17964) What is the motivation for this PR? Transitioning from using vSonic peers to cEOS peers for some tests. Requested by Microsoft. How did you do it? Adding cli commands for each cEOS peers at the start and end of the PTF test advanced-reboot.py. How did you verify/test it? Tested with test_upgrade_path.py::test_upgrade_path on a Arista-7050CX3-32S-C32. Signed-off-by: opcoder0 <[email protected]>
…sonic-net#17964) What is the motivation for this PR? Transitioning from using vSonic peers to cEOS peers for some tests. Requested by Microsoft. How did you do it? Adding cli commands for each cEOS peers at the start and end of the PTF test advanced-reboot.py. How did you verify/test it? Tested with test_upgrade_path.py::test_upgrade_path on a Arista-7050CX3-32S-C32. Signed-off-by: Aharon Malkin <[email protected]>
…sonic-net#17964) What is the motivation for this PR? Transitioning from using vSonic peers to cEOS peers for some tests. Requested by Microsoft. How did you do it? Adding cli commands for each cEOS peers at the start and end of the PTF test advanced-reboot.py. How did you verify/test it? Tested with test_upgrade_path.py::test_upgrade_path on a Arista-7050CX3-32S-C32. Signed-off-by: Guy Shemesh <[email protected]>
…sonic-net#17964) What is the motivation for this PR? Transitioning from using vSonic peers to cEOS peers for some tests. Requested by Microsoft. How did you do it? Adding cli commands for each cEOS peers at the start and end of the PTF test advanced-reboot.py. How did you verify/test it? Tested with test_upgrade_path.py::test_upgrade_path on a Arista-7050CX3-32S-C32. Signed-off-by: Guy Shemesh <[email protected]>
Description of PR
Summary:
Extend LACP timeout for cEOS peers by setting LACP timer multiplier to
5during advanced-reboot PTF test, and restore the value to the default3after the test completes.Fixes # (issue)
Type of change
Back port request
Approach
What is the motivation for this PR?
Transitioning from using vSonic peers to cEOS peers for some tests. Requested by Microsoft.
How did you do it?
Adding cli commands for each cEOS peers at the start and end of the PTF test
advanced-reboot.py.How did you verify/test it?
Tested with
test_upgrade_path.py::test_upgrade_pathon aArista-7050CX3-32S-C32.Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation