Skip to content

Add module platform_tests/test_kdump.py into PR test#12732

Merged
wangxin merged 8 commits intosonic-net:masterfrom
yutongzhang-microsoft:yutongzhang/add_test_kdump
May 13, 2024
Merged

Add module platform_tests/test_kdump.py into PR test#12732
wangxin merged 8 commits intosonic-net:masterfrom
yutongzhang-microsoft:yutongzhang/add_test_kdump

Conversation

@yutongzhang-microsoft
Copy link
Copy Markdown
Contributor

@yutongzhang-microsoft yutongzhang-microsoft commented May 6, 2024

Description of PR

To adapt to kvm testbed, there are two isssues in previous pdu_controller init.py script:

  • conn_graph_facts is None on kvm testbed, and it will generate KeyError when trying to get the value using method conn_graph_facts["xxx"].
  • inv_mgr has no function called get_host_list

So in this PR, I fix these issues

  • Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
  • The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. So remove the code in this branch in this PR.

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 201911
  • 202012
  • 202205
  • 202305
  • 202311

Approach

What is the motivation for this PR?

To adapt to kvm testbed, there are two isssues in previous pdu_controller init.py script:

  • conn_graph_facts is None on kvm testbed, and it will generate KeyError when trying to get the value using method conn_graph_facts["xxx"].
  • inv_mgr has no function called get_host_list

So in this PR, I fix these issues

  • Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
  • The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. So remove the code in this branch in this PR.

How did you do it?

  • Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
  • Remove the code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info in this PR.

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@yutongzhang-microsoft yutongzhang-microsoft changed the title [test] Add module platform_tests/test_kdump.py into PR test Add module platform_tests/test_kdump.py into PR test May 7, 2024
if ANSIBLE_DIR not in sys.path:
sys.path.append(ANSIBLE_DIR)

from devutil.inv_helpers import HostManager # noqa E402
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little bit concerned with adding this new dependency. Purpose of HostManager is to find some hosts from inventory, right?

Is there other way to do this without this new dependency? Can we take advantage of the InventoryManager in duthosts.host.options["inventory_manager"]?

Or enhance the design of conn_graph_facts for VS setup a little bit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info: is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. Am I right?

And @Xichen96, do you agree with me?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we try to get pdu info form inventory, unfortunately, for most DUTs, we will get None because there is no key pdu_host under most of the hosts. And although we can get the pdu hosts list of a DUT from inventory, we can not get the hwsku and os of pdu host from inventory. Inventory doesn't supply such information. So I think the code in this if branch is redundant and remove them.

device_pdu_links = conn_graph_facts.get('device_pdu_links', {})
device_pdu_info = conn_graph_facts.get('device_pdu_info', {})

pdu_links = device_pdu_links.get(hostname, {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if hostnames are not there for device_pdu_links and device_pdu_info, why are we changing it them to empty?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just saw your previous comments.
could you add a comment here to explain why you change to return empty hostnames if they are not found in csv files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change aims to adapt to the kvm testbed. For kvm, conn_graph_facts is None, so we use function get to avoid KeyError and give it default value {} if the key doesn't exist.

I think here we don't return empty hostnames, the pdu info is empty here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, could you add this to code as comment as well? It'll help a lot

unfortunately, for most DUTs, we will get None because there is no key pdu_host under most of the hosts. And although we can get the pdu hosts list of a DUT from inventory, we can not get the hwsku and os of pdu host from inventory.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added.

@wangxin wangxin merged commit 5cbb6f7 into sonic-net:master May 13, 2024
@yutongzhang-microsoft yutongzhang-microsoft deleted the yutongzhang/add_test_kdump branch May 13, 2024 01:23
mrkcmo pushed a commit to Azarack/sonic-mgmt that referenced this pull request Jul 17, 2024
What is the motivation for this PR?
To adapt to kvm testbed, there are two isssues in previous pdu_controller init.py script:
* conn_graph_facts is None on kvm testbed, and it will generate KeyError when trying to get the value using method conn_graph_facts["xxx"].
* inv_mgr has no function called get_host_list

So in this PR, I fix these issues
* Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
* The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. So remove the code in this branch in this PR.

How did you do it?
* Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
* Remove the code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info in this PR.
lizhijianrd pushed a commit to lizhijianrd/sonic-mgmt that referenced this pull request Aug 7, 2024
What is the motivation for this PR?
To adapt to kvm testbed, there are two isssues in previous pdu_controller init.py script:
* conn_graph_facts is None on kvm testbed, and it will generate KeyError when trying to get the value using method conn_graph_facts["xxx"].
* inv_mgr has no function called get_host_list

So in this PR, I fix these issues
* Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
* The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. So remove the code in this branch in this PR.

How did you do it?
* Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
* Remove the code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info in this PR.
StormLiangMS pushed a commit that referenced this pull request Aug 8, 2024
…boot doesn't work (#14011)

* Add module `platform_tests/test_kdump.py` into PR test (#12732)

What is the motivation for this PR?
To adapt to kvm testbed, there are two isssues in previous pdu_controller init.py script:
* conn_graph_facts is None on kvm testbed, and it will generate KeyError when trying to get the value using method conn_graph_facts["xxx"].
* inv_mgr has no function called get_host_list

So in this PR, I fix these issues
* Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
* The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. So remove the code in this branch in this PR.

How did you do it?
* Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
* Remove the code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info in this PR.

* [test_ro_disk] Recover DUT to RW state by power-cycle when reboot doesn't work (#13974)

What is the motivation for this PR?
On some platforms, DUT cannot be recovered from RO-disk state by reboot. (e.g., On Nokia-7215, we saw the reboot is blocked by systemd-journald.service) To avoid DUT stuck at RO disk state, this PR introduce power-cycle as the final approach to recover DUT.

How did you do it?
If reboot failed to recover DUT from RO disk state, try power-cycle to recover the DUT.

How did you verify/test it?
Verified on Nokia-7215 M0 testbed. Get test passed with below logs:

tacacs/test_ro_disk.py::test_ro_disk[dut-7215-4]
-------------------------------------------------------------------------------- live log call --------------------------------------------------------------------------------
10:02:17 test_ro_disk.do_reboot                   L0089 ERROR  | DUT did not go down, exception: run module command failed, Ansible Results =>
{"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:0/3
10:04:02 test_ro_disk.do_reboot                   L0089 ERROR  | DUT did not go down, exception: run module command failed, Ansible Results =>
{"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:1/3
10:05:24 test_ro_disk.do_reboot                   L0089 ERROR  | DUT did not go down, exception: run module command failed, Ansible Results =>
{"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:2/3
10:05:44 test_ro_disk.do_reboot                   L0095 ERROR  | Failed to reboot DUT after 3 retries
10:05:44 test_ro_disk.test_ro_disk                L0262 WARNING| Failed to reboot dut-7215-4, try PDU reboot to restore disk RW state
PASSED

---------

Co-authored-by: Yutong Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants