Add module `platform_tests/test_kdump.py` into PR test by yutongzhang-microsoft · Pull Request #12732 · sonic-net/sonic-mgmt

yutongzhang-microsoft · 2024-05-06T08:31:20Z

Description of PR

To adapt to kvm testbed, there are two isssues in previous pdu_controller init.py script:

conn_graph_facts is None on kvm testbed, and it will generate KeyError when trying to get the value using method conn_graph_facts["xxx"].
inv_mgr has no function called get_host_list

So in this PR, I fix these issues

Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. So remove the code in this branch in this PR.

Summary:
Fixes # (issue)

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Back port request

Approach

What is the motivation for this PR?

To adapt to kvm testbed, there are two isssues in previous pdu_controller init.py script:

conn_graph_facts is None on kvm testbed, and it will generate KeyError when trying to get the value using method conn_graph_facts["xxx"].
inv_mgr has no function called get_host_list

So in this PR, I fix these issues

Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. So remove the code in this branch in this PR.

How did you do it?

Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
Remove the code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info in this PR.

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

wangxin · 2024-05-08T02:55:46Z

tests/common/plugins/pdu_controller/__init__.py

+if ANSIBLE_DIR not in sys.path:
+    sys.path.append(ANSIBLE_DIR)
+
+from devutil.inv_helpers import HostManager            # noqa E402


I am a little bit concerned with adding this new dependency. Purpose of HostManager is to find some hosts from inventory, right?

Is there other way to do this without this new dependency? Can we take advantage of the InventoryManager in duthosts.host.options["inventory_manager"]?

Or enhance the design of conn_graph_facts for VS setup a little bit?

The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info: is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. Am I right?

And @Xichen96, do you agree with me?

Also, if we try to get pdu info form inventory, unfortunately, for most DUTs, we will get None because there is no key pdu_host under most of the hosts. And although we can get the pdu hosts list of a DUT from inventory, we can not get the hwsku and os of pdu host from inventory. Inventory doesn't supply such information. So I think the code in this if branch is redundant and remove them.

wenyiz2021 · 2024-05-08T20:48:04Z

tests/common/plugins/pdu_controller/__init__.py

+    device_pdu_links = conn_graph_facts.get('device_pdu_links', {})
+    device_pdu_info = conn_graph_facts.get('device_pdu_info', {})
+
+    pdu_links = device_pdu_links.get(hostname, {})


if hostnames are not there for device_pdu_links and device_pdu_info, why are we changing it them to empty?

just saw your previous comments.
could you add a comment here to explain why you change to return empty hostnames if they are not found in csv files.

This change aims to adapt to the kvm testbed. For kvm, conn_graph_facts is None, so we use function get to avoid KeyError and give it default value {} if the key doesn't exist.

I think here we don't return empty hostnames, the pdu info is empty here.

thanks, could you add this to code as comment as well? It'll help a lot

unfortunately, for most DUTs, we will get None because there is no key pdu_host under most of the hosts. And although we can get the pdu hosts list of a DUT from inventory, we can not get the hwsku and os of pdu host from inventory.

Sure, added.

What is the motivation for this PR? To adapt to kvm testbed, there are two isssues in previous pdu_controller init.py script: * conn_graph_facts is None on kvm testbed, and it will generate KeyError when trying to get the value using method conn_graph_facts["xxx"]. * inv_mgr has no function called get_host_list So in this PR, I fix these issues * Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict. * The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. So remove the code in this branch in this PR. How did you do it? * Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict. * Remove the code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info in this PR.

…boot doesn't work (#14011) * Add module `platform_tests/test_kdump.py` into PR test (#12732) What is the motivation for this PR? To adapt to kvm testbed, there are two isssues in previous pdu_controller init.py script: * conn_graph_facts is None on kvm testbed, and it will generate KeyError when trying to get the value using method conn_graph_facts["xxx"]. * inv_mgr has no function called get_host_list So in this PR, I fix these issues * Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict. * The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. So remove the code in this branch in this PR. How did you do it? * Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict. * Remove the code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info in this PR. * [test_ro_disk] Recover DUT to RW state by power-cycle when reboot doesn't work (#13974) What is the motivation for this PR? On some platforms, DUT cannot be recovered from RO-disk state by reboot. (e.g., On Nokia-7215, we saw the reboot is blocked by systemd-journald.service) To avoid DUT stuck at RO disk state, this PR introduce power-cycle as the final approach to recover DUT. How did you do it? If reboot failed to recover DUT from RO disk state, try power-cycle to recover the DUT. How did you verify/test it? Verified on Nokia-7215 M0 testbed. Get test passed with below logs: tacacs/test_ro_disk.py::test_ro_disk[dut-7215-4] -------------------------------------------------------------------------------- live log call -------------------------------------------------------------------------------- 10:02:17 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results => {"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:0/3 10:04:02 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results => {"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:1/3 10:05:24 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results => {"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:2/3 10:05:44 test_ro_disk.do_reboot L0095 ERROR | Failed to reboot DUT after 3 retries 10:05:44 test_ro_disk.test_ro_disk L0262 WARNING| Failed to reboot dut-7215-4, try PDU reboot to restore disk RW state PASSED --------- Co-authored-by: Yutong Zhang <[email protected]>

yutongzhang-microsoft added 2 commits May 6, 2024 16:29

Modify pdu controller

d746e7c

fix KeyError

0dbab3b

yutongzhang-microsoft changed the title ~~[test] Add module platform_tests/test_kdump.py into PR test~~ Add module platform_tests/test_kdump.py into PR test May 7, 2024

yutongzhang-microsoft requested review from Xichen96, wangxin and wenyiz2021 May 8, 2024 02:12

wangxin reviewed May 8, 2024

View reviewed changes

yutongzhang-microsoft added 3 commits May 8, 2024 14:25

Remove unnecessary code

1e76b4c

modify

e402887

modify

887854d

wenyiz2021 reviewed May 8, 2024

View reviewed changes

yutongzhang-microsoft and others added 3 commits May 9, 2024 10:22

Add comment

79c1333

Merge branch 'master' into yutongzhang/add_test_kdump

5a5d149

Add comment

0cb8aed

wenyiz2021 approved these changes May 10, 2024

View reviewed changes

wangxin approved these changes May 13, 2024

View reviewed changes

wangxin merged commit 5cbb6f7 into sonic-net:master May 13, 2024

yutongzhang-microsoft deleted the yutongzhang/add_test_kdump branch May 13, 2024 01:23

lizhijianrd mentioned this pull request Aug 7, 2024

[202311][test_ro_disk] Recover DUT to RW state by power-cycle when reboot doesn't work #14011

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add module `platform_tests/test_kdump.py` into PR test#12732

Add module `platform_tests/test_kdump.py` into PR test#12732
wangxin merged 8 commits intosonic-net:masterfrom
yutongzhang-microsoft:yutongzhang/add_test_kdump

yutongzhang-microsoft commented May 6, 2024 •

edited

Loading

Uh oh!

wangxin May 8, 2024

Uh oh!

yutongzhang-microsoft May 8, 2024

Uh oh!

yutongzhang-microsoft May 8, 2024

Uh oh!

wenyiz2021 May 8, 2024

Uh oh!

wenyiz2021 May 8, 2024

Uh oh!

yutongzhang-microsoft May 9, 2024

Uh oh!

wenyiz2021 May 9, 2024

Uh oh!

yutongzhang-microsoft May 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yutongzhang-microsoft commented May 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of PR

Type of change

Back port request

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yutongzhang-microsoft commented May 6, 2024 •

edited

Loading