Skip to content

Improve "Verify all interfaces are up" error handling#749

Closed
romankachur-mlnx wants to merge 3 commits intosonic-net:masterfrom
romankachur-mlnx:master
Closed

Improve "Verify all interfaces are up" error handling#749
romankachur-mlnx wants to merge 3 commits intosonic-net:masterfrom
romankachur-mlnx:master

Conversation

@romankachur-mlnx
Copy link
Copy Markdown
Contributor

  1. Modified interface.yml
  2. Added check_testbed_interfaces.yml
  3. Added testbed_vm_status.yml playbook
  4. Added show_int_portchannel_status.j2

Description of PR

Summary:

  1. Modified interface.yml to include check_testbed_interfaces.yml,
    when 'Verify interfaces are up' fails.
  2. Added check_testbed_interfaces.yml to gather interfaces status on:
    • DUT
    • Fanout (for MLNX only, filtered by check_interfaces_status tag)
    • Testbed server (by calling testbed_vm_status.yml playbook)
    • relevant VMs
  3. Added testbed_vm_status.yml palybook to enter Testbed server and gather VMs status.
  4. Added show_int_portchannel_status.j2 to enter each relevant VM and gather Port-Channel status.

Type of change

  • [] Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

How did you do it?

I changed interface.yml flow to gather more information from Testbed (DUT / Fanout / Server / VMS)
about the statuses of interfaces and Port-Channels.
Now, when 'Verify all interfaces are up' step when fail, playbook executes additional steps
to gather relevant information (interfaces statuses) from Testbed Server / DUT / Fanout / VMS.

How did you verify/test it?

I ran for example LLDP test (or any other test using interfaces.yml).
I used two cases:

  1. Normal flow, when all interfaces are up.
    In this case nothing is changes
  2. Bug flow, when Fanout has deliberately its ports down,
    or when some relevant VM has its Port-Channel deliberately down.
    In that cases, the test fails as usual, but gathers additional information through the Testbed.

Any platform specific information?

Current implementation can enter Fanout switch, only if Fanout is MLNX type.
The 'Check Fanout interfaces' step in check_testbed_interfaces.yml calls fanout.yml (actually fanout role) with tag 'check_interfaces_status'
In this case anyone can modify roles/fanout/tasks/main.yml to execute Fanout specific step with
when: peer_hwsku == "<specific_fanout_type>"
tags: check_interfaces_status

Example:

 ###################################################################
 # Check Fanout interfaces status                                  #
 ###################################################################
- block:
  - name: Check Fanout interfaces status
    action: apswitch template=roles/fanout/templates/mlnx_interfaces_status.j2
    connection: switch
    register: fanout_interfaces
    args:
      login: "{{ switch_login['MLNX-OS'] }}"

  - debug:
      msg: "{{ fanout_interfaces.stdout.split('\n') }}"

  when: peer_hwsku == "MLNX-OS"
  tags: check_interfaces_status

Supported testbed topology if it's a new test case?

N/A

Documentation

N/A

Roman Kachur added 3 commits December 6, 2018 12:41
… case 'Verify interfaces are up' fails.

2. Added check_testbed_interfaces.yml to gather interfaces status on:
   - DUT.
   - Fanout (for MLNX only, filtered by check_interfaces_status tag).
   - Testbed server (by calling testbed_vm_status.yml playbook).
   - relevant VMs.
3. Added testbed_vm_status.yml palybook to enter Testbed server and gather VMs status.
4. Added show_int_portchannel_status.j2 to enter each relevant VM and gather Port-Channel status.
… case 'Verify interfaces are up' fails.

2. Added check_testbed_interfaces.yml to gather interfaces status on:
   - DUT
   - Fanout (for MLNX only, filtered by check_interfaces_status tag)
   - Testbed server (by calling testbed_vm_status.yml playbook)
   - relevant VMs
3. Added testbed_vm_status.yml palybook to enter Testbed server and gather VMs status.
4. Added show_int_portchannel_status.j2 to enter each relevant VM and gather Port-Channel status.
   when 'Verify interfaces are up' fails.
2. Added check_testbed_interfaces.yml to gather interfaces status on:
   - DUT
   - Fanout (for MLNX only, filtered by check_interfaces_status tag)
   - Testbed server (by calling testbed_vm_status.yml playbook)
   - relevant VMs
3. Added testbed_vm_status.yml palybook to enter Testbed server and gather VMs status.
4. Added show_int_portchannel_status.j2 to enter each relevant VM and gather Port-Channel status.
@msftclas
Copy link
Copy Markdown

msftclas commented Dec 6, 2018

CLA assistant check
All CLA requirements met.

@liat-grozovik
Copy link
Copy Markdown
Collaborator

This is useful feature for debug which found to be working for us.
@pavel-shirshov @lguohan appreciate if you can review it as well and if possible validate common files changes does not introduce degradations.

Copy link
Copy Markdown
Contributor

@pavel-shirshov pavel-shirshov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where the most information from the added steps are going?
We run them, ignore errors and miss all output (we can see the output only in debug mode)


- name: Check Fanout interfaces
local_action: shell ansible-playbook -i lab fanout.yml -l {{ fanout_switch }} --tags check_interfaces_status
ignore_errors: yes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where the output does go?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All gathered info goes to the execution console, where it will be extracted if needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we save the output into the variable and output it as

debug: var: output.stdout_lines

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it has register/debug of output:
When interfaces.yml fails,
it calls check_testbed_interfaces.yml with check_fanout: true

check_testbed_interface.yml plays another playbook - fanout.yml with tag check_interfaces_status:
local_action: shell ansible-playbook -i lab fanout.yml -l {{ fanout_switch }} --tags check_interfaces_status
which in turns calls fanout main.yml
which contains our custom block (which was not upstreamed):

 ###################################################################
 # Check Fanout interfaces status                                  #
 ###################################################################
- block:
  - name: Check Fanout interfaces status
    action: apswitch template=roles/fanout/templates/mlnx_interfaces_status.j2
    connection: switch
    register: fanout_interfaces
    args:
      login: "{{ switch_login['MLNX-OS'] }}"

  - debug:
      msg: "{{ fanout_interfaces.stdout.split('\n') }}"

  when: peer_hwsku == "MLNX-OS"
  tags: check_interfaces_status

Here we expect any team may have their own block relevant to specific vendor, called by
tags: check_interfaces_status

That is why you might have not got any information when you ran the test.

ignore_errors: yes

- name: Get teamd dump
shell: teamdctl '{{ item }}' state dump
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where the output does go?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All gathered info goes to the execution console, where it will be extracted if needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we save the output into the variable and output it as

debug: var: output.stdout_lines

Copy link
Copy Markdown
Contributor Author

@romankachur-mlnx romankachur-mlnx Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its simple imrovement, but will be done as new PR (this source branch doesn't exist anymore).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To handle this case, I opened another PR, that includes this changes as well
#815

ignore_errors: yes

- name: Gather vm list from Testbed server
local_action: shell ansible-playbook testbed_vm_status.yml -i veos -l "{{ testbed_facts['server'] }}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where the output does go?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All gathered info goes to the execution console, where it will be extracted if needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we save the output into the variable and output it as

debug: var: output.stdout_lines

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the same explanation as in the first conversation.

When interfaces.yml fails,
it calls check_testbed_interfaces.yml with check_vms: true

check_testbed_interface.yml plays another playbook - testbed_vm_status.yml:
local_action: shell ansible-playbook testbed_vm_status.yml -i veos -l "{{ testbed_facts['server'] }}"

testbed_vm_status.yml has register/debug of output

- hosts: servers:&vm_host
  tasks:
  - name: Get VM statuses from Testbed server
    shell: virsh list
    register: virsh_list
  - name: Show VM statuses
    debug: msg="{{ virsh_list['stdout_lines'] }}"

but it only will be shown in verbose mode -vvvvv of the outer playbook:

connection: switch
ignore_errors: yes
when: vms["{{ item }}"]['hwsku'] == 'Arista-VM'
with_items: vms
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where the output does go?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All gathered info goes to the execution console, where it will be extracted if needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we save the output into the variable and output it as

debug: var: output.stdout_lines

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its simple imrovement, but will be done as new PR (this source brunch doesn't exist anymore).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To handle this case, I opened another PR, that includes this changes as well
#815

wangxin pushed a commit to wangxin/sonic-mgmt that referenced this pull request Oct 27, 2025
…PE-384C-B-O128S2 SKU (sonic-net#749)

<!--
Please make sure you've read and understood our contributing guidelines;
https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md

Please provide following information to help code review process a bit easier:
-->
### Description of PR
<!--
- Please include a summary of the change and which issue is fixed.
- Please also include relevant motivation and context. Where should reviewer start? background context?
- List any dependencies that are required for this change.
-->

Summary: Fix k8s node joining failure for Arista-7060X6-16PE-384C-B-O128S2 SKU
Fixes # (issue)

### Type of change

<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->

- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ ] New Test case
 - [ ] Skipped for non-supported platforms
- [x] Test case improvement

### Back port request
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [x] 202505

### Approach
#### What is the motivation for this PR?
For SKU Arista-7060X6-16PE-384C-B-O128S2, k8s test case fails due to joining failure, need to fix.
#### How did you do it?
The reason why the case fails is that there's a parameter for kubelet which will prefer IPv6 address, but in testbed, we are using IPv4, so just need to specify the --node-ip=mgmt_ip.
#### How did you verify/test it?
Run the modified code on Arista-7060X6-16PE-384C-B-O128S2 DUT, the case runs successfully.

#### Any platform specific information?
N/A
#### Supported testbed topology if it's a new test case?
N/A
### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
…nux-kernel] advance submodule head (sonic-net#13906)

linkmgrd:
* 3e7a9df 2023-02-19 | [active-active] Toggle to standby if default route is missing (sonic-net#171) (HEAD -> 202205) [Longxiang Lyu]
* 8ab1b2b 2023-02-15 | [active-active] fix issue that interfaces get stuck in `active` if service starts up with link state down (sonic-net#169) [Jing Zhang]
* df862ad 2023-02-11 | Fix mux config when gRPC connection is lost (sonic-net#166) [Longxiang Lyu]

utilities:
* 8aa7930c 2023-02-13 | [portstat CLI] don't print reminder if use json format (sonic-net#2670) (HEAD -> 202205, github/202205) [wenyiz2021]
* 4e3bb6fa 2023-02-21 | Add "show fabric reachability" command. (sonic-net#2672) [jfeng-arista]
* 3587a94b 2023-02-18 | [202205][dhcp_relay] Remove add field of vlanid to DHCP_RELAY table while adding vlan (sonic-net#2680) [Yaqiang Zhu]
* 4f07f7f0 2023-02-10 | Skip saidump for Spine Router as this can take more than 5 sec (sonic-net#2637) (sonic-net#2671) [kenneth-arista]
* e61c5ec4 2023-02-10 | [vlan] Refresh dhcpv6_relay config while adding/deleting a vlan (sonic-net#2660) (sonic-net#2669) [Yaqiang Zhu]

swss:
* 1bbf725 2023-02-14 | [Workaround] EvpnRemoteVnip2pOrch warmboot check failure (sonic-net#2626) (HEAD -> 202205) [jcaiMR]
* 380f72b 2023-02-20 | Support for tc-dot1p and tc-dscp qosmap (sonic-net#2559) [Divya Mukundan]
* dbf6fcc 2022-11-01 | Added LAG member check on addLagMember() (sonic-net#2464) [Andriy Kokhan]

swss-common:
* b31391b 2023-02-21 | Prevent sonic-db-cli generate core dump (sonic-net#749) (HEAD -> 202205) [Hua Liu]
* 16ff689 2022-12-13 | Support for TC-DOT1p qos map (sonic-net#721) [Divya Mukundan]

platform-daemons:
* fb92af4 2023-02-09 | [ycabled] add more coverage to ycabled; add minor name change for vendor API CLI return key-values pairs (sonic-net#338) (HEAD -> 202205) [vdahiya12]

linux-kernel:
* 4e62401 2023-02-09 | Update linux kernel for hw-mgmt V.7.0020.4104 (sonic-net#305) (HEAD -> 202205) [Stephen Sun]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026


Why I did it
e732ed0 - Prevent sonic-db-cli generate core dump (Update submodule: sairedis sonic-net#749) (4 minutes ago) [Hua Liu]
28adcb4 - Support for TC-DOT1p qos map (Update submodules: sonic-swss-common, sonic-sairedis sonic-net#721) (5 minutes ago) [Divya Mukundan]
How I did it
How to verify it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants