Skip to content

[advanced-reboot] IO path verification fixes for dualtor#6968

Merged
vaibhavhd merged 5 commits intosonic-net:masterfrom
vaibhavhd:dualtor-wb-fixes
Dec 17, 2022
Merged

[advanced-reboot] IO path verification fixes for dualtor#6968
vaibhavhd merged 5 commits intosonic-net:masterfrom
vaibhavhd:dualtor-wb-fixes

Conversation

@vaibhavhd
Copy link
Copy Markdown
Contributor

@vaibhavhd vaibhavhd commented Dec 6, 2022

Description of PR

Summary: Update checks for vlan and dut MAC in the bidirectional IO.
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 201911
  • 202012
  • 202205

Approach

What is the motivation for this PR?

Three improvements:

  1. Present test_advanced_reboot crashes in examine_flow. This is due to packets are incorrectly matched in the code.
  2. Additionally, added icmp_responder in the test, so that mux state is healthy during the duration of test.
  3. Change mux config on both TORs to manual during the duration of test.

Fixes:

  1. Identifying received packet in dualtor case -- t1->server rcvd pkt will have src MAC as vlan_mac, and server->t1 rcvd pkt will have src MAC as dut_mac
  2. Identifying sent packet -- t1->server sent pkt will have dst MAC as dut_mac, and server->t1 sent pkt will have dst MAC as vlan_mac

How did you do it?

Added checks for vlan_mac
execute config mux mode manuall all during test, and revert to auto after the test.

How did you verify/test it?

Tested on physical testbed. The test does not crash anymore.
However, there are still some packet drops which points towards an image issue.

2022-12-01 23:32:22 : **************** Packet received summary: ********************
2022-12-01 23:32:22 : *********** Sent packets captured - 45000
2022-12-01 23:32:22 : *********** received packets captured - t1-to-vlan - 35919
2022-12-01 23:32:22 : *********** received packets captured - vlan-to-t1 - 8284
2022-12-01 23:32:22 : *********** Missed received packets - t1-to-vlan - 81
2022-12-01 23:32:22 : *********** Missed received packets - vlan-to-t1 - 716
2022-12-01 23:32:22 : **************************************************************
2022-12-01 23:32:22 : Disruptions happen between 0:00:08.393471 and 0:03:51.360718 after the reboot.
2022-12-01 23:32:22 : Total incoming packets captured 44203

2022-12-01 23:33:21 : The longest disruption lasted 0.045 seconds. 8 packet(s) lost.
2022-12-01 23:33:21 : Total disruptions count is 756. All disruptions lasted 4.434 seconds. Total 797 packet(s) lost

2022-12-01 23:33:37 : --------------------------------------------------
2022-12-01 23:33:37 : Fails:
2022-12-01 23:33:37 : --------------------------------------------------
2022-12-01 23:33:37 : FAILED:dut:Total downtime period must be less then 0:00:00 seconds. It was 4.43433070183

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@azure-pipelines
Copy link
Copy Markdown

The pre-commit check detected issues in the files touched by this pull request.
The detected issues may be old or new. For new issues, please try to fix them.

For old issues, it is not mandatory to fix them because they were not caused by this change. It is unfair to blame
author of this pull request. But if you can take extra effort to fix the old issues as well, that would be great!

Detailed pre-commit check results:
trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

Fixing tests/common/fixtures/advanced_reboot.py

fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing tests/platform_tests/test_advanced_reboot.py

check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Failed
- hook id: check-ast
- exit code: 1
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

kellyyeh
kellyyeh previously approved these changes Dec 6, 2022
@vaibhavhd
Copy link
Copy Markdown
Contributor Author

/Azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

yxieca
yxieca previously approved these changes Dec 6, 2022
@yxieca
Copy link
Copy Markdown
Collaborator

yxieca commented Dec 6, 2022

@vaibhavhd can you update the PR comments, it should be dst mac instead of src mac in your 'fixes' section.

@vaibhavhd vaibhavhd dismissed stale reviews from yxieca and kellyyeh via a838c70 December 8, 2022 00:40
yxieca
yxieca previously approved these changes Dec 8, 2022
@vaibhavhd
Copy link
Copy Markdown
Contributor Author

/Azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@StormLiangMS
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

kellyyeh
kellyyeh previously approved these changes Dec 14, 2022
@vaibhavhd vaibhavhd dismissed stale reviews from kellyyeh and yxieca via 99ac7e1 December 16, 2022 01:35
@vaibhavhd vaibhavhd requested a review from kellyyeh December 16, 2022 19:19
@vaibhavhd vaibhavhd merged commit 7bd0b8b into sonic-net:master Dec 17, 2022
@vaibhavhd vaibhavhd deleted the dualtor-wb-fixes branch December 17, 2022 04:58
@wangxin
Copy link
Copy Markdown
Collaborator

wangxin commented Dec 19, 2022

@vaibhavhd Cherry-picking this PR to 202205 branch hit similar issue.

wangxin pushed a commit that referenced this pull request Dec 19, 2022
Three improvements:

Present test_advanced_reboot crashes in examine_flow. This is due to packets are incorrectly matched in the code.
Additionally, added icmp_responder in the test, so that mux state is healthy during the duration of test.
Change mux config on both TORs to manual during the duration of test.
Fixes:

Identifying received packet in dualtor case -- t1->server rcvd pkt will have src MAC as vlan_mac, and server->t1 rcvd pkt will have src MAC as dut_mac
Identifying sent packet -- t1->server sent pkt will have dst MAC as dut_mac, and server->t1 sent pkt will have dst MAC as vlan_mac
How did you do it?
Added checks for vlan_mac
execute config mux mode manuall all during test, and revert to auto after the test.

How did you verify/test it?
Tested on physical testbed. The test does not crash anymore.
However, there are still some packet drops which points towards an image issue.
vaibhavhd added a commit that referenced this pull request Jan 6, 2023
…-responder on single TOR (#7188)

This PR fixes two recently introduced issues:

Fix advanced reboot tests fail on non-dualtor testbeds #7146 caused by [advanced-reboot] IO path verification fixes for dualtor #6968
After merge of PR Fix pre-commit check errors in platform_tests/conftest.py #7049 , all testcases using get_advanced_reboot fixture are failing due to fixture 'get_advanced_reboot' not found
How did you do it?
To fix 7146, do not run GARP service and ICMP responder for non-dualtor advanced-reboot cases. I have not added a more simple skip for non-dualtor as these services are still needed for MOCK dualtor cases which run on single TOR testbeds.

To fix 2, added a new import in the test scripts directly.
neethajohn added a commit that referenced this pull request Jan 20, 2023
Signed-off-by: Neetha John <nejo@microsoft.com>

What is the motivation for this PR?
The following commit done as part of cherry-picking to 202012 appears bad (ab3d07d)
This was trying to bring in changes from #6968 and #6994

How did you do it?
Cleaned up the inadvertent changes

How did you verify/test it?
Ran the crm testcase "test_crm_nexthop" with the changes and it passes
ppikh added a commit to ppikh/sonic-mgmt that referenced this pull request Jan 24, 2023
Fixed "TypeError: __init__() takes exactly 8 arguments (7 given)" in test platform_tests/test_cont_warm_reboot.py which happen after merge community code.
Issue introduced in PR: sonic-net#6968 - [advanced-reboot] IO path verification fixes for dualtor

Signed-off-by: Petro Pikh <petrop@nvidia.com>
liat-grozovik pushed a commit that referenced this pull request Jan 29, 2023
)

Fixed "TypeError: __init__() takes exactly 8 arguments (7 given)" in test platform_tests/test_cont_warm_reboot.py which happen after merge community code.
Issue introduced in PR: #6968 - [advanced-reboot] IO path verification fixes for dualtor

- What is the motivation for this PR?
Fix issue "TypeError: init() takes exactly 8 arguments (7 given)"

- How did you do it?
Added argument in method call

- How did you verify/test it?
Executed test which has been modified

Signed-off-by: Petro Pikh <petrop@nvidia.com>
wangxin pushed a commit that referenced this pull request Feb 15, 2023
)

Fixed "TypeError: __init__() takes exactly 8 arguments (7 given)" in test platform_tests/test_cont_warm_reboot.py which happen after merge community code.
Issue introduced in PR: #6968 - [advanced-reboot] IO path verification fixes for dualtor

- What is the motivation for this PR?
Fix issue "TypeError: init() takes exactly 8 arguments (7 given)"

- How did you do it?
Added argument in method call

- How did you verify/test it?
Executed test which has been modified

Signed-off-by: Petro Pikh <petrop@nvidia.com>
wangxin pushed a commit that referenced this pull request Feb 15, 2023
)

Fixed "TypeError: __init__() takes exactly 8 arguments (7 given)" in test platform_tests/test_cont_warm_reboot.py which happen after merge community code.
Issue introduced in PR: #6968 - [advanced-reboot] IO path verification fixes for dualtor

- What is the motivation for this PR?
Fix issue "TypeError: init() takes exactly 8 arguments (7 given)"

- How did you do it?
Added argument in method call

- How did you verify/test it?
Executed test which has been modified

Signed-off-by: Petro Pikh <petrop@nvidia.com>
kellyyeh pushed a commit to kellyyeh/sonic-mgmt that referenced this pull request Mar 31, 2023
…nic-net#7322)

Fixed "TypeError: __init__() takes exactly 8 arguments (7 given)" in test platform_tests/test_cont_warm_reboot.py which happen after merge community code.
Issue introduced in PR: sonic-net#6968 - [advanced-reboot] IO path verification fixes for dualtor

- What is the motivation for this PR?
Fix issue "TypeError: init() takes exactly 8 arguments (7 given)"

- How did you do it?
Added argument in method call

- How did you verify/test it?
Executed test which has been modified

Signed-off-by: Petro Pikh <petrop@nvidia.com>
wangxin pushed a commit that referenced this pull request Apr 20, 2023
…-responder on single TOR (#7188)

This PR fixes two recently introduced issues:

Fix advanced reboot tests fail on non-dualtor testbeds #7146 caused by [advanced-reboot] IO path verification fixes for dualtor #6968
After merge of PR Fix pre-commit check errors in platform_tests/conftest.py #7049 , all testcases using get_advanced_reboot fixture are failing due to fixture 'get_advanced_reboot' not found
How did you do it?
To fix 7146, do not run GARP service and ICMP responder for non-dualtor advanced-reboot cases. I have not added a more simple skip for non-dualtor as these services are still needed for MOCK dualtor cases which run on single TOR testbeds.

To fix 2, added a new import in the test scripts directly.
vaibhavhd added a commit that referenced this pull request Aug 23, 2023
…unt for sent and rcvd packets (#9599)

Summary: Currently dual tor warmboot fails w/ control plane not ready for warmboot. This is because the test expects 2 responses for every ping packet sent to Loopback. This expectation is not correct after PR #9089 is merged.

Update the testcase to now only expect 1 icmp response for every icmp packet sent to Loopback0.
Updated the test to remove the earlier expectation of removing the packets that were sent by icmp_responder.

This essentially reverts part of older PR: #6968

How did you verify/test it?
Tested on a physical testbed and this test issue is not seen.
The test is now able to see the responses of ping packets
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Aug 29, 2023
…unt for sent and rcvd packets (sonic-net#9599)

Summary: Currently dual tor warmboot fails w/ control plane not ready for warmboot. This is because the test expects 2 responses for every ping packet sent to Loopback. This expectation is not correct after PR sonic-net#9089 is merged.

Update the testcase to now only expect 1 icmp response for every icmp packet sent to Loopback0.
Updated the test to remove the earlier expectation of removing the packets that were sent by icmp_responder.

This essentially reverts part of older PR: sonic-net#6968

How did you verify/test it?
Tested on a physical testbed and this test issue is not seen.
The test is now able to see the responses of ping packets
mssonicbld pushed a commit that referenced this pull request Aug 29, 2023
…unt for sent and rcvd packets (#9599)

Summary: Currently dual tor warmboot fails w/ control plane not ready for warmboot. This is because the test expects 2 responses for every ping packet sent to Loopback. This expectation is not correct after PR #9089 is merged.

Update the testcase to now only expect 1 icmp response for every icmp packet sent to Loopback0.
Updated the test to remove the earlier expectation of removing the packets that were sent by icmp_responder.

This essentially reverts part of older PR: #6968

How did you verify/test it?
Tested on a physical testbed and this test issue is not seen.
The test is now able to see the responses of ping packets
zjswhhh pushed a commit to zjswhhh/sonic-mgmt that referenced this pull request Sep 20, 2023
…unt for sent and rcvd packets (sonic-net#9599)

Summary: Currently dual tor warmboot fails w/ control plane not ready for warmboot. This is because the test expects 2 responses for every ping packet sent to Loopback. This expectation is not correct after PR sonic-net#9089 is merged.

Update the testcase to now only expect 1 icmp response for every icmp packet sent to Loopback0.
Updated the test to remove the earlier expectation of removing the packets that were sent by icmp_responder.

This essentially reverts part of older PR: sonic-net#6968

How did you verify/test it?
Tested on a physical testbed and this test issue is not seen.
The test is now able to see the responses of ping packets
wangxin pushed a commit that referenced this pull request Sep 28, 2023
…unt for sent and rcvd packets (#9599) (#10068)

Raising PR for cherry-pick conflict:
225e257 [advanced-reboot][dualtor] Update Loopback0 ping test: expect same count for sent and rcvd packets (#9599)

Summary: Currently dual tor warmboot fails w/ control plane not ready for warmboot. This is because the test expects 2 responses for every ping packet sent to Loopback. This expectation is not correct after PR #9089 is merged.

Update the testcase to now only expect 1 icmp response for every icmp packet sent to Loopback0.
Updated the test to remove the earlier expectation of removing the packets that were sent by icmp_responder.

This essentially reverts part of older PR: #6968

How did you verify/test it?
Tested on a physical testbed and this test issue is not seen.
The test is now able to see the responses of ping packets

Co-authored-by: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com>
AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Jan 25, 2024
…unt for sent and rcvd packets (sonic-net#9599)

Summary: Currently dual tor warmboot fails w/ control plane not ready for warmboot. This is because the test expects 2 responses for every ping packet sent to Loopback. This expectation is not correct after PR sonic-net#9089 is merged.

Update the testcase to now only expect 1 icmp response for every icmp packet sent to Loopback0.
Updated the test to remove the earlier expectation of removing the packets that were sent by icmp_responder.

This essentially reverts part of older PR: sonic-net#6968

How did you verify/test it?
Tested on a physical testbed and this test issue is not seen.
The test is now able to see the responses of ping packets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants