Add module platform_tests/test_kdump.py into PR test#12732
Add module platform_tests/test_kdump.py into PR test#12732wangxin merged 8 commits intosonic-net:masterfrom
platform_tests/test_kdump.py into PR test#12732Conversation
platform_tests/test_kdump.py into PR testplatform_tests/test_kdump.py into PR test
| if ANSIBLE_DIR not in sys.path: | ||
| sys.path.append(ANSIBLE_DIR) | ||
|
|
||
| from devutil.inv_helpers import HostManager # noqa E402 |
There was a problem hiding this comment.
I am a little bit concerned with adding this new dependency. Purpose of HostManager is to find some hosts from inventory, right?
Is there other way to do this without this new dependency? Can we take advantage of the InventoryManager in duthosts.host.options["inventory_manager"]?
Or enhance the design of conn_graph_facts for VS setup a little bit?
There was a problem hiding this comment.
The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info: is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. Am I right?
And @Xichen96, do you agree with me?
There was a problem hiding this comment.
Also, if we try to get pdu info form inventory, unfortunately, for most DUTs, we will get None because there is no key pdu_host under most of the hosts. And although we can get the pdu hosts list of a DUT from inventory, we can not get the hwsku and os of pdu host from inventory. Inventory doesn't supply such information. So I think the code in this if branch is redundant and remove them.
| device_pdu_links = conn_graph_facts.get('device_pdu_links', {}) | ||
| device_pdu_info = conn_graph_facts.get('device_pdu_info', {}) | ||
|
|
||
| pdu_links = device_pdu_links.get(hostname, {}) |
There was a problem hiding this comment.
if hostnames are not there for device_pdu_links and device_pdu_info, why are we changing it them to empty?
There was a problem hiding this comment.
just saw your previous comments.
could you add a comment here to explain why you change to return empty hostnames if they are not found in csv files.
There was a problem hiding this comment.
This change aims to adapt to the kvm testbed. For kvm, conn_graph_facts is None, so we use function get to avoid KeyError and give it default value {} if the key doesn't exist.
I think here we don't return empty hostnames, the pdu info is empty here.
There was a problem hiding this comment.
thanks, could you add this to code as comment as well? It'll help a lot
unfortunately, for most DUTs, we will get None because there is no key pdu_host under most of the hosts. And although we can get the pdu hosts list of a DUT from inventory, we can not get the hwsku and os of pdu host from inventory.
There was a problem hiding this comment.
Sure, added.
What is the motivation for this PR?
To adapt to kvm testbed, there are two isssues in previous pdu_controller init.py script:
* conn_graph_facts is None on kvm testbed, and it will generate KeyError when trying to get the value using method conn_graph_facts["xxx"].
* inv_mgr has no function called get_host_list
So in this PR, I fix these issues
* Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
* The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. So remove the code in this branch in this PR.
How did you do it?
* Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
* Remove the code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info in this PR.
What is the motivation for this PR?
To adapt to kvm testbed, there are two isssues in previous pdu_controller init.py script:
* conn_graph_facts is None on kvm testbed, and it will generate KeyError when trying to get the value using method conn_graph_facts["xxx"].
* inv_mgr has no function called get_host_list
So in this PR, I fix these issues
* Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
* The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. So remove the code in this branch in this PR.
How did you do it?
* Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict.
* Remove the code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info in this PR.
…boot doesn't work (#14011) * Add module `platform_tests/test_kdump.py` into PR test (#12732) What is the motivation for this PR? To adapt to kvm testbed, there are two isssues in previous pdu_controller init.py script: * conn_graph_facts is None on kvm testbed, and it will generate KeyError when trying to get the value using method conn_graph_facts["xxx"]. * inv_mgr has no function called get_host_list So in this PR, I fix these issues * Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict. * The code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info is unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. So remove the code in this branch in this PR. How did you do it? * Use method get to get the value in dict conn_graph_facts to avoid KeyError and set it {} if the key not exists in the dict. * Remove the code in branch if hostname not in device_pdu_links or hostname not in device_pdu_info in this PR. * [test_ro_disk] Recover DUT to RW state by power-cycle when reboot doesn't work (#13974) What is the motivation for this PR? On some platforms, DUT cannot be recovered from RO-disk state by reboot. (e.g., On Nokia-7215, we saw the reboot is blocked by systemd-journald.service) To avoid DUT stuck at RO disk state, this PR introduce power-cycle as the final approach to recover DUT. How did you do it? If reboot failed to recover DUT from RO disk state, try power-cycle to recover the DUT. How did you verify/test it? Verified on Nokia-7215 M0 testbed. Get test passed with below logs: tacacs/test_ro_disk.py::test_ro_disk[dut-7215-4] -------------------------------------------------------------------------------- live log call -------------------------------------------------------------------------------- 10:02:17 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results => {"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:0/3 10:04:02 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results => {"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:1/3 10:05:24 test_ro_disk.do_reboot L0089 ERROR | DUT did not go down, exception: run module command failed, Ansible Results => {"failed": true, "msg": "Timeout (62s) waiting for privilege escalation prompt: "} attempt:2/3 10:05:44 test_ro_disk.do_reboot L0095 ERROR | Failed to reboot DUT after 3 retries 10:05:44 test_ro_disk.test_ro_disk L0262 WARNING| Failed to reboot dut-7215-4, try PDU reboot to restore disk RW state PASSED --------- Co-authored-by: Yutong Zhang <[email protected]>
Description of PR
To adapt to kvm testbed, there are two isssues in previous pdu_controller init.py script:
conn_graph_facts["xxx"].get_host_listSo in this PR, I fix these issues
getto get the value in dictconn_graph_factsto avoid KeyError and set it{}if the key not exists in the dict.if hostname not in device_pdu_links or hostname not in device_pdu_infois unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. So remove the code in this branch in this PR.Summary:
Fixes # (issue)
Type of change
Back port request
Approach
What is the motivation for this PR?
To adapt to kvm testbed, there are two isssues in previous pdu_controller init.py script:
conn_graph_facts["xxx"].get_host_listSo in this PR, I fix these issues
getto get the value in dictconn_graph_factsto avoid KeyError and set it{}if the key not exists in the dict.if hostname not in device_pdu_links or hostname not in device_pdu_infois unnecessary here. Because, we use conn_graph_facts to get pdu links and pdu info first. if the hostname not in device_pdu_links or hostname not in device_pdu_info here means the host doesn't not exist in the csv file, so we don't have to get info from inventory. So remove the code in this branch in this PR.How did you do it?
getto get the value in dictconn_graph_factsto avoid KeyError and set it{}if the key not exists in the dict.if hostname not in device_pdu_links or hostname not in device_pdu_infoin this PR.How did you verify/test it?
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation