Skip to content

syncd crash and hung seen with warm-reboot and fast-reboot on T0 topology- HEAD.253-2872d802 #3934

@mini-nair-dell

Description

@mini-nair-dell

Description
+++++++++++++++

  • Observing orchagent and syncd crash while performing warm-reboot in Master image 157
  • The issue seen with only T0 topology and not T1-lag-64 topo
  • After the crash, many dockers are not running.
  • Attached the traces from the core

Pls find the logs below. The issue is not see in the master image 154

Syslog snippet:

Dec 20 06:15:56.828794 sonic-s6100-07 ERR swss#orchagent: :- sai_redis_internal_notify_syncd: notify syncd failed to get response result from select: 2
Dec 20 06:15:56.828794 sonic-s6100-07 ERR swss#orchagent: :- sai_redis_internal_notify_syncd: notify syncd failed to get response
Dec 20 06:15:56.828894 sonic-s6100-07 ERR swss#orchagent: :- sai_redis_notify_syncd: notify syncd failed: SAI_STATUS_FAILURE
Dec 20 06:15:56.828894 sonic-s6100-07 ERR swss#orchagent: :- initSaiRedis: Failed to notify syncd INIT_VIEW, rv:-1
Dec 20 06:15:56.829618 sonic-s6100-07 INFO swss#supervisord: orchagent terminate called without an active exception
Dec 20 06:15:58.010736 sonic-s6100-07 INFO swss#supervisor-proc-exit-listener: Process orchagent exited unxepectedly. Terminating supervisor...
Dec 20 06:15:58.571107 sonic-s6100-07 INFO swss.sh[1708]: No longer waiting on container 'syncd'
Dec 20 06:15:58.604890 sonic-s6100-07 NOTICE root: Stopping swss service...
Dec 20 06:15:58.612537 sonic-s6100-07 NOTICE root: Locking /tmp/swss-syncd-lock from swss service

root@sonic-s6100-07:/var/core# warm-reboot -vvv
Fri Dec 20 06:12:23 UTC 2019 Pausing orchagent ...
Fri Dec 20 06:12:23 UTC 2019 Stopping radv ...
Fri Dec 20 06:12:24 UTC 2019 Stopping bgp ...
Fri Dec 20 06:12:24 UTC 2019 Stopped bgp ...
Fri Dec 20 06:12:27 UTC 2019 Initialize pre-shutdown ...
Fri Dec 20 06:12:28 UTC 2019 Requesting pre-shutdown ...
Fri Dec 20 06:12:29 UTC 2019 Waiting for pre-shutdown ...
Fri Dec 20 06:16:20 UTC 2019 Syncd pre-shutdown failed: requesting ...
Fri Dec 20 06:16:20 UTC 2019 warm-reboot failure (11) cleanup ...
Fri Dec 20 06:16:21 UTC 2019 Cancel warm-reboot: code (1)

Core files :

root@sonic-s6100-07:/var/core# ls -ltr
total 10568
-rw-rw-rw- 1 root root 10261200 Dec 20 08:54 syncd.1576832093.28.core.gz
-rw-rw-rw- 1 root root 278329 Dec 20 08:56 orchagent.1576832194.45.core.gz
-rw-rw-rw- 1 root root 278347 Dec 20 08:58 orchagent.1576832301.47.core.gz
root@sonic-s6100-07:/var/core#

root@sonic-s6100-07:/var/core# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7b13c13d2fe1 docker-dhcp-relay-dbg:latest "/usr/bin/docker_ini…" 3 hours ago Up 3 hours dhcp_relay
6ef8beec5762 docker-syncd-brcm-dbg:latest "/usr/bin/supervisord" 3 hours ago Up 3 hours syncd
fcedb3fa4cf6 docker-teamd-dbg:latest "/usr/bin/supervisord" 3 hours ago Up 3 hours teamd
689537cc97d1 docker-platform-monitor-dbg:latest "/usr/bin/docker_ini…" 3 hours ago Up 3 hours pmon
8cb6929f9659 docker-fpm-frr-dbg:latest "/usr/bin/supervisord" 3 hours ago Up 3 hours bgp
8934c8414ccd docker-database-dbg:latest "/usr/local/bin/dock…" 3 hours ago Up 3 hours database
root@sonic-s6100-07:/var/core#

Attached:

  • Syslog
  • Core traces

Fast-reboot
+++++++++

  • Fast-reboot stucks as well

root@sonic-s6100-07:~# fast-reboot -vvv
Fri Dec 20 12:08:14 UTC 2019 Stopping radv ...
Fri Dec 20 12:08:15 UTC 2019 Stopping bgp ...
Fri Dec 20 12:08:16 UTC 2019 Stopped bgp ...
Fri Dec 20 12:08:17 UTC 2019 Stopping teamd ...
Fri Dec 20 12:08:18 UTC 2019 Stopped teamd ...
Fri Dec 20 12:08:29 UTC 2019 Stopping syncd ...
Fri Dec 20 12:08:29 UTC 2019 Stopped syncd ...
Fri Dec 20 12:08:29 UTC 2019 Stopping all remaining containers ...
Fri Dec 20 12:08:30 UTC 2019 Stopped all remaining containers ...
Fri Dec 20 12:08:32 UTC 2019 Rebooting with /sbin/kexec -e to SONiC-OS-HEAD.157-dirty-20191219.005759 ...

Thanks
Mini

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions