Skip to content

Bug: config reload -f can break lldp_syncd sync for LLDP_LOC_CHASSIS #26297

@Yakiv-Huryk

Description

@Yakiv-Huryk

Is it platform specific

generic

Importance or Severity

Critical

Description of the bug

When doing a config reload -f while lldp is starting (e.g. another config reload hasn't finished yet), the lldp container can survive the restart, which leads to lldp_syncd never repopulating the LLDP_LOC_CHASSIS.

Example:

First config reload:

2026 Feb  7 12:42:52.975231 dut INFO python[21430]: ansible-ansible.legacy.command Invoked with _raw_params=config reload -y _uses_shell=True warn=False stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None

LLDP is starting

2026 Feb  7 12:45:07.305872 dut INFO featured: Running cmd: '['sudo', 'systemctl', 'unmask', 'lldp.service']'
2026 Feb  7 12:45:08.532009 dut INFO featured: Running cmd: '['sudo', 'systemctl', 'enable', 'lldp.service']'
2026 Feb  7 12:45:09.624649 dut INFO featured: Running cmd: '['sudo', 'systemctl', 'start', 'lldp.service']'
2026 Feb  7 12:45:09.770046 dut INFO systemd[1]: Starting lldp.service - LLDP container...
2026 Feb  7 12:45:09.945814 dut NOTICE admin: Starting lldp service...
2026 Feb  7 12:45:10.708215 dut INFO lldp.sh[31457]: Starting existing lldp container with HWSKU Mellanox-SN2700
2026 Feb  7 12:45:13.693650 dut DEBUG container: read_data: config:True feature:lldp fields:[('set_owner', 'local'), ('no_fallback_to_local', False), ('state', 'disabled')] val:['local', False, 'enabled']
2026 Feb  7 12:45:13.693725 dut DEBUG container: read_data: config:False feature:lldp fields:[('current_owner', 'none'), ('remote_state', 'none'), ('container_id', '')] val:['none', 'none', '']
2026 Feb  7 12:45:13.698011 dut DEBUG container: container_start: lldp: set_owner:local fallback:True remote_state:none server_connected:false
2026 Feb  7 12:45:14.819520 dut INFO container: docker cmd: start for lldp

second config reload that "stops" LLDP (it's not really stopped, lldp's supervisord continues to run, later the lldp_syncd starts.

2026 Feb  7 12:45:15.121321 dut NOTICE switch_trimming: 'reload' executing with command: config reload -y -f
2026 Feb  7 12:45:15.387433 dut ERR featured: ['sudo', 'systemctl', 'start', 'lldp.service'] - failed: return code - 1, output:
2026 Feb  7 12:45:15.390976 dut ERR featured: Feature 'lldp.service' failed to be enabled and started
2026 Feb  7 12:45:15.407459 dut NOTICE healthd#sysmonitor[8414]: Received event:lldp.service from source:feature time:2026-02-07 10:45:15
2026 Feb  7 12:45:15.508675 dut WARNING systemd[1]: lldp.service: Control process exited, code=killed, status=15/TERM
2026 Feb  7 12:45:15.508761 dut WARNING systemd[1]: lldp.service: Failed with result 'signal'.
2026 Feb  7 12:45:15.508841 dut INFO systemd[1]: Stopped lldp.service - LLDP container.
2026 Feb  7 12:45:15.508921 dut INFO systemd[1]: lldp.service: Consumed 1.036s CPU time, 30.7M memory peak.
2026 Feb  7 12:45:25.727509 dut INFO lldp#supervisord 2026-02-07 12:45:23,144 INFO Included extra file "/etc/supervisor/conf.d/supervisord.conf" during parsing
2026 Feb  7 12:45:25.727509 dut INFO lldp#supervisord 2026-02-07 12:45:23,144 INFO Set uid to user 0 succeeded
2026 Feb  7 12:45:25.727509 dut INFO lldp#supervisord 2026-02-07 12:45:23,162 INFO RPC interface 'supervisor' initialized
2026 Feb  7 12:45:33.498103 dut INFO zlldp-syncd [lldp_syncd] INFO: Starting SONiC LLDP sync daemon...

Since lldp_syncd survives the config reload, it never repopulates the LLDP_LOC_CHASSIS table due to an internal cache:
https://github.com/sonic-net/sonic-dbsyncd/blob/22335e0688627429967d7c751c7ff8c9c6bb6d00/src/lldp_syncd/daemon.py#L383

Warnings regarding missing info:

2026 Feb  7 12:48:02.965408 dut WARNING snmp#snmp-subagent [sonic_ax_impl] WARNING: Missing lldp_loc_man_addr from APPL DB

This fails the sonic-mgmt test_snmp_lldp test.

Steps to Reproduce

It's hard to reproduce since it's timing-related. In general, it's triggered by the config reload -f during the time lldp is starting.

Actual Behavior and Expected Behavior

The lldp container should be stopped on config reload -f (which will lead to a correct behavior of lldp_syncd)

Relevant log output

Output of show version, show techsupport

202511

Attach files (if any)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions