Skip to content

[warm-reboot test] Update warm-reboot test to support kvm image#2392

Merged
shi-su merged 23 commits intosonic-net:masterfrom
shi-su:adv-reboot-kvm
Nov 5, 2020
Merged

[warm-reboot test] Update warm-reboot test to support kvm image#2392
shi-su merged 23 commits intosonic-net:masterfrom
shi-su:adv-reboot-kvm

Conversation

@shi-su
Copy link
Contributor

@shi-su shi-su commented Oct 22, 2020

Description of PR

Summary: Update warm-reboot test to support kvm image
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

What is the motivation for this PR?

The motivation of this PR is to enable basic checks of the warm-reboot functionality using kvm images.

How did you do it?

As the data plane will go down in kvm image during warm-reboot. I changed the test logic to mainly focusing on the control plane behavior and relax data plane constraints.

A mark is added in the testcase to indicate it support kvm image and to filter out testcases not compatible for kvm test.

Example command to run the test
pytest platform_tests/test_advanced_reboot.py --testbed=vms-kvm-t0 --inventory veos_vtb --host-pattern all --user admin -vvv --show-capture stdout --testbed_file vtestbed.csv --disable_loganalyzer --skip_sanity
Test output

======================================================================= test session starts =======================================================================
platform linux2 -- Python 2.7.12, pytest-4.6.5, py-1.9.0, pluggy-0.13.1 -- /usr/bin/python
cachedir: .pytest_cache
metadata: {'Python': '2.7.12', 'Platform': 'Linux-4.15.0-39-generic-x86_64-with-Ubuntu-16.04-xenial', 'Packages': {'py': '1.9.0', 'pytest': '4.6.5', 'pluggy': '0.13.1'}, 'Plugins': {u'repeat': u'0.8.0', u'html': u'1.22.1', u'xdist': u'1.28.0', u'ansible': u'2.2.2', u'forked': u'1.3.0', u'metadata': u'1.10.0'}}
ansible: 2.8.12
rootdir: /data/sonic-mgmt/tests, inifile: pytest.ini
plugins: ansible-2.2.2, metadata-1.10.0, repeat-0.8.0, xdist-1.28.0, forked-1.3.0, html-1.22.1
collected 9 items

platform_tests/test_advanced_reboot.py::test_fast_reboot SKIPPED                                                                                            [ 11%]
platform_tests/test_advanced_reboot.py::test_warm_reboot PASSED                                                                                             [ 22%]
platform_tests/test_advanced_reboot.py::test_warm_reboot_sad SKIPPED                                                                                        [ 33%]
platform_tests/test_advanced_reboot.py::test_warm_reboot_multi_sad SKIPPED                                                                                  [ 44%]
platform_tests/test_advanced_reboot.py::test_warm_reboot_multi_sad_inboot SKIPPED                                                                           [ 55%]
platform_tests/test_advanced_reboot.py::test_warm_reboot_sad_bgp SKIPPED                                                                                    [ 66%]
platform_tests/test_advanced_reboot.py::test_warm_reboot_sad_lag_member SKIPPED                                                                             [ 77%]
platform_tests/test_advanced_reboot.py::test_warm_reboot_sad_lag SKIPPED                                                                                    [ 88%]
platform_tests/test_advanced_reboot.py::test_warm_reboot_sad_vlan_port SKIPPED                                                                              [100%]

============================================================== 1 passed, 8 skipped in 345.63 seconds ==============================================================

Ptftest summary

2020-10-22 22:57:41 : --------------------------------------------------
2020-10-22 22:57:41 : Summary:
2020-10-22 22:57:41 : --------------------------------------------------
2020-10-22 22:57:41 : Longest downtime period was 0:02:10.667073
2020-10-22 22:57:41 : Control plane downtime period was 0:01:05.868460
2020-10-22 22:57:41 : Reboot time was 0:02:13.807981
2020-10-22 22:57:41 : Expected downtime is less then 0:02:30
2020-10-22 22:57:41 : --------------------------------------------------
2020-10-22 22:57:41 : Additional info:
2020-10-22 22:57:41 : --------------------------------------------------
2020-10-22 22:57:41 : INFO:10.250.0.54:PortChannel interface state changed 1 times
2020-10-22 22:57:41 : INFO:10.250.0.51:PortChannel interface state changed 1 times
2020-10-22 22:57:41 : INFO:10.250.0.52:PortChannel interface state changed 1 times
2020-10-22 22:57:41 : INFO:10.250.0.53:PortChannel interface state changed 1 times
2020-10-22 22:57:41 : --------------------------------------------------

How did you verify/test it?

Verified by running the testcase locally on kvm

Any platform specific information?

This change is to enable extra support for kvm platform in warm-reboot test. It should not affect advanced reboot tests for other platforms

Supported testbed topology if it's a new test case?

Documentation

@lgtm-com
Copy link

lgtm-com bot commented Oct 22, 2020

This pull request introduces 1 alert when merging 82304d6a32d93578b5065474b84bbfed8272f8b3 into 1b5539d - view on LGTM.com

new alerts:

  • 1 for Testing equality to None

@lguohan
Copy link
Contributor

lguohan commented Oct 22, 2020

can you provide the command to run this test and also the test results?

@lguohan lguohan requested review from tahmed-dev and yxieca October 22, 2020 21:44
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spaces around = do not match lines below. It is preferred to stick to module style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly the first half of these check_params do not have the spaces while the second half of them have. I originally put it as the last one and matched the style there. For some reason, I moved it to the first one.

Anyway, I updated all of them so that the style matches. Thanks for catching this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

health checks are separate and so why checking for kvm_test here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added handle_advanced_reboot_health_check_kvm to separate tests for kvm and real switches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you have separate health check for kvm and so suggest to clean up the code a bit as there is not need to check for kvm_test inside handle_fast_reboot_health_check and handle_warm_reboot_health_check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up as suggested.

@lguohan
Copy link
Contributor

lguohan commented Oct 23, 2020

@shi-su, can you add your test in tests/kvmtest.sh, it will be automatically tested in the testbed now as part of the pr check.

tests/kvmtest.sh Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the right position? should we add to as the first entry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved up the advanced reboot tests. Since pc/test_po_cleanup.py has a config reload the test, I moved it to the last one to execute.
Also, the run_tests.sh script had a logic to sort the tests according to the alphabetical order. This would mess up the order we want it to run tests. I think there is limited benefit to run tests in alphabetical order. Therefore, I updated run_tests.sh so that the tests are run in the order of command line input

@lguohan
Copy link
Contributor

lguohan commented Oct 24, 2020

can you check the test failure?

@shi-su
Copy link
Contributor Author

shi-su commented Oct 24, 2020

can you check the test failure?

Yes, working on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this reboot limit? why do we have different reboot limit for kvm v.s. physical devices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reboot limit is basically the limit for data plane down time. Since kvm is expected to experience a longer data plane down time, I set different default values for kvm and physical devices.

@shi-su shi-su marked this pull request as draft October 26, 2020 17:45
@shi-su shi-su marked this pull request as ready for review October 27, 2020 06:45
@lguohan
Copy link
Contributor

lguohan commented Oct 27, 2020

what is the reason all sad path tests are skipped?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is good. can you pull this into a new pr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a new PR #2419

@shi-su shi-su requested a review from qiluo-msft October 27, 2020 23:35
Copy link
Contributor

@qiluo-msft qiluo-msft Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duthost.facts['asic_type'] == 'vs': [](start = 11, length = 35)

Better to check x86_64-kvm_x86_64-r0 in platform #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking x86_64-kvm_x86_64-r0 in platform instead.

Copy link
Contributor

@qiluo-msft qiluo-msft Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kvmCompatible [](start = 56, length = 13)

Why we need this? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this and checking pytest mark instead.

Copy link
Contributor

@vaibhavhd vaibhavhd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments.

Copy link
Contributor

@qiluo-msft qiluo-msft Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle_advanced_reboot_health_check_kvm [](start = 8, length = 39)

The function body is similar to handle_fast_reboot_health_check. Reuse code? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in kvm test is different from handle_fast_reboot_health_check. For example, fast reboot test expects the data plane to stay up for a while after the control plane goes down, this is not expected in kvm. Reusing the code requires a lot of checks for kvm in the function, which is a kind of distraction. Also, separating it into another function would make kvm tests and real switch tests independence. It would be beneficial if we want to add kvm or physical device specific checks in the future, since we only need to change the corresponding functions.

Copy link
Contributor

@qiluo-msft qiluo-msft Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device_type [](start = 13, length = 11)

filter by platfrom kvm? #WontFix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The device_type marker is currently used in other tests to mark kvm-compatible tests [ref: #1860]. I wanted to use the same marker to make it consistent with other test cases.

Copy link
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments

@shi-su shi-su requested a review from qiluo-msft October 31, 2020 19:51
@shi-su
Copy link
Contributor Author

shi-su commented Nov 4, 2020

retest vsimage please

1 similar comment
@yxieca
Copy link
Collaborator

yxieca commented Nov 5, 2020

retest vsimage please

@shi-su shi-su merged commit 3a4f547 into sonic-net:master Nov 5, 2020
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
Update sonic-swss submodule pointer to include the following:
* 8eea92e [202205][counters] Revert PR sonic-net#2432 for the buffer queue/pg counters improvement ([sonic-net#2462](sonic-net/sonic-swss#2462))
* 5d8636a [202205] Enhance orchagent and buffer manager in error handling (sonic-net#2414) ([sonic-net#2449](sonic-net/sonic-swss#2449))
* aa22237 [Everflow/ERSPAN] Set correct destination port and mac address when the nexthop is updated for ERSPAN mirror destination (sonic-net#2392) ([sonic-net#2455](sonic-net/sonic-swss#2455))
* 04ce7be check state_db for po before sending ARP/ND pkts (sonic-net#2444) ([sonic-net#2450](sonic-net/sonic-swss#2450))
* f0138a2 [portmgr] Fixed the orchagent crash due to late arrival of notif (sonic-net#2431) ([sonic-net#2451](sonic-net/sonic-swss#2451))
* 7cfde48 Change the log messages in addKernelNeigh/Route from ERROR to INFO ([sonic-net#2437](sonic-net/sonic-swss#2437))
* 2c5116e [202205][counters] Improve performance by polling only configured ports buffer queue/pg counters ([sonic-net#2432](sonic-net/sonic-swss#2432))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants