Fix system-health crash on VS platform due to missing metadata file#26430
Closed
yxieca wants to merge 1 commit intosonic-net:masterfrom
Closed
Fix system-health crash on VS platform due to missing metadata file#26430yxieca wants to merge 1 commit intosonic-net:masterfrom
yxieca wants to merge 1 commit intosonic-net:masterfrom
Conversation
The VS chassis platform module raises FileNotFoundError when /etc/sonic/vs_chassis_metadata.json does not exist. This file is only present on VS chassis (T2-VOQ) setups, not on standalone VS platforms (e.g., ToR used in KVM testbeds). The crash prevents system-health service from running, which means SYSTEM_READY|SYSTEM_STATE is never set in STATE_DB. This blocks any daemon that waits for system-ready (e.g., hsflowd), causing sflow tests to fail with a 240-second timeout. Fix: return empty metadata dict when the file doesn't exist, instead of raising an exception. The chassis methods that need metadata already raise KeyError for missing fields. Fixes: sonic-net#26429 Signed-off-by: Ying Xie <[email protected]>
Collaborator
|
/azp run Azure.sonic-buildimage |
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a startup crash in the VS sonic_platform chassis implementation by making metadata loading tolerant of a missing /etc/sonic/vs_chassis_metadata.json, which prevents system-health.service from failing on standalone VS platforms.
Changes:
- Remove the
FileNotFoundErrorthrown when the VS chassis metadata file is absent. - Allow chassis initialization to proceed with an empty metadata dictionary when the file is missing.
Contributor
Author
|
Closing — need to build and test locally before re-raising. The initial fix was incomplete (didn't preserve FileNotFoundError for T2-VOQ chassis). Will reopen with proper local testing. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The VS chassis platform module (
sonic_platform/chassis.py) raisesFileNotFoundErrorwhen/etc/sonic/vs_chassis_metadata.jsondoes not exist. This file is only present on VS chassis (T2-VOQ) setups introduced in #18512, not on standalone VS platforms (e.g., ToR used in KVM testbeds).Impact
The crash prevents
system-health.servicefrom running, which meansSYSTEM_READY|SYSTEM_STATEis never set in STATE_DB. This blocks any daemon that waits for system-ready — specificallyhsflowdin the sflow container, which loops inwaitConfig/getSystemReady()forever, never reads CONFIG_DB, and never generates/etc/hsflowd.auto.This causes all sflow KVM tests to fail on trixie images with:
The issue is currently masked in CI because sflow tests are blanket-skipped via
tests_mark_conditions.yaml(sonic-mgmt#21701).Fix
Return an empty metadata dict when the file doesn't exist instead of raising an exception. The chassis methods that consume metadata (
get_supervisor_slot,get_linecard_slot,get_my_slot) already raiseKeyError/ValueErrorfor missing fields, so the behavior is correct for chassis setups.Testing
SYSTEM_READY(the workaround), hsflowd initializes correctly and sflow tests passsystem-health.servicestarts successfully on standalone VS, which setsSYSTEM_READYautomaticallyFixes #26429