Skip to content

Fix system-health crash on VS platform due to missing metadata file#26430

Closed
yxieca wants to merge 1 commit intosonic-net:masterfrom
yxieca:fix/vs-chassis-metadata
Closed

Fix system-health crash on VS platform due to missing metadata file#26430
yxieca wants to merge 1 commit intosonic-net:masterfrom
yxieca:fix/vs-chassis-metadata

Conversation

@yxieca
Copy link
Copy Markdown
Contributor

@yxieca yxieca commented Mar 26, 2026

Description

The VS chassis platform module (sonic_platform/chassis.py) raises FileNotFoundError when /etc/sonic/vs_chassis_metadata.json does not exist. This file is only present on VS chassis (T2-VOQ) setups introduced in #18512, not on standalone VS platforms (e.g., ToR used in KVM testbeds).

Impact

The crash prevents system-health.service from running, which means SYSTEM_READY|SYSTEM_STATE is never set in STATE_DB. This blocks any daemon that waits for system-ready — specifically hsflowd in the sflow container, which loops in waitConfig/getSystemReady() forever, never reads CONFIG_DB, and never generates /etc/hsflowd.auto.

This causes all sflow KVM tests to fail on trixie images with:

Failed: hsflowd failed to initialize collector(s) ['20.1.1.2'] within 240 seconds

The issue is currently masked in CI because sflow tests are blanket-skipped via tests_mark_conditions.yaml (sonic-mgmt#21701).

Fix

Return an empty metadata dict when the file doesn't exist instead of raising an exception. The chassis methods that consume metadata (get_supervisor_slot, get_linecard_slot, get_my_slot) already raise KeyError/ValueError for missing fields, so the behavior is correct for chassis setups.

Testing

  • Verified on KVM testbed: after manually setting SYSTEM_READY (the workaround), hsflowd initializes correctly and sflow tests pass
  • The fix ensures system-health.service starts successfully on standalone VS, which sets SYSTEM_READY automatically

Fixes #26429

The VS chassis platform module raises FileNotFoundError when
/etc/sonic/vs_chassis_metadata.json does not exist. This file is only
present on VS chassis (T2-VOQ) setups, not on standalone VS platforms
(e.g., ToR used in KVM testbeds).

The crash prevents system-health service from running, which means
SYSTEM_READY|SYSTEM_STATE is never set in STATE_DB. This blocks any
daemon that waits for system-ready (e.g., hsflowd), causing sflow
tests to fail with a 240-second timeout.

Fix: return empty metadata dict when the file doesn't exist, instead
of raising an exception. The chassis methods that need metadata
already raise KeyError for missing fields.

Fixes: sonic-net#26429

Signed-off-by: Ying Xie <[email protected]>
@yxieca yxieca requested a review from lguohan as a code owner March 26, 2026 19:16
Copilot AI review requested due to automatic review settings March 26, 2026 19:16
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a startup crash in the VS sonic_platform chassis implementation by making metadata loading tolerant of a missing /etc/sonic/vs_chassis_metadata.json, which prevents system-health.service from failing on standalone VS platforms.

Changes:

  • Remove the FileNotFoundError thrown when the VS chassis metadata file is absent.
  • Allow chassis initialization to proceed with an empty metadata dictionary when the file is missing.

@yxieca
Copy link
Copy Markdown
Contributor Author

yxieca commented Mar 26, 2026

Closing — need to build and test locally before re-raising. The initial fix was incomplete (didn't preserve FileNotFoundError for T2-VOQ chassis). Will reopen with proper local testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

system-health service crashes on trixie VS image due to missing vs_chassis_metadata.json

3 participants