Skip to content

PMON container crashes in latest SONiC images #5986

@ciju-juniper

Description

@ciju-juniper

Description
Seeing issues with 'pmon' container startup and the following error in the syslog. All the platform monitoring daemons are killed and pmon also stopped after a few trials

Nov 19 18:48:15.770882 sonic INFO pmon#supervisord: start sonic-platform package not installed, attempting to install...
Nov 19 18:48:15.770882 sonic INFO pmon#supervisord: start Error: Unable to locate /usr/share/sonic/platform/sonic_platform-1.0-py2-none-any.whl
Nov 19 18:48:16.060439 sonic INFO pmon#supervisord: start sonic-platform package not installed, attempting to install...
Nov 19 18:48:16.060439 sonic INFO pmon#supervisord: start Error: Unable to locate /usr/share/sonic/platform/sonic_platform-1.0-py3-none-any.whl
Nov 19 18:48:16.133494 sonic INFO pmon#supervisord: xcvrd Traceback (most recent call last):
Nov 19 18:48:16.133494 sonic INFO pmon#supervisord: xcvrd   File "/usr/local/bin/xcvrd", line 5, in <module>
Nov 19 18:48:16.133494 sonic INFO pmon#supervisord: xcvrd     from src.xcvrd import main
Nov 19 18:48:16.133494 sonic INFO pmon#supervisord: xcvrd ImportError: No module named src.xcvrd
Nov 19 18:48:16.142902 sonic INFO pmon#supervisor-proc-exit-listener: Process xcvrd exited unxepectedly. Terminating supervisor...

Initial Triage
The last good build on the master branch was build# 481 and pmon crashes are seen from build# 482 onwards. In between, there are a few commits in which I suspect the following commit introduced the breakage:

[submodule]: update sonic-platform-daemons (#5868) (detail / githubweb)

Platform details
This problem should be there in most of the platforms. I had tested it on Juniper QFX5210 & QFX5200 platforms.

root@sonic:~# show version

SONiC Software Version: SONiC.master.482-aee389e4
Distribution: Debian 10.6
Kernel: 4.19.0-9-2-amd64
Build commit: aee389e4
Build date: Wed Nov 11 06:51:47 UTC 2020
Built by: johnar@jenkins-worker-8

Platform: x86_64-juniper_qfx5200-r0
HwSKU: Juniper-QFX5200-32C-S
ASIC: broadcom
Serial Number: ZA0220160436
Uptime: 17:52:40 up 2 min,  1 user,  load average: 1.76, 0.66, 0.24

show techsupport
Here is the techsupport archive:
sonic_dump_sonic_20201120_175417.tar.gz

@vdahiya12 Could you please take a look? Please let me know if you need any further details. Also, please suggest if there are any platform side changes required after this PR #5868

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions