Skip to content

system-health service_checker should check containers based on asic presence (cherry-pick #13497)#13966

Merged
yxieca merged 1 commit intosonic-net:202205from
spilkey-cisco:spilkey/system-health-202205
Feb 28, 2023
Merged

system-health service_checker should check containers based on asic presence (cherry-pick #13497)#13966
yxieca merged 1 commit intosonic-net:202205from
spilkey-cisco:spilkey/system-health-202205

Conversation

@spilkey-cisco
Copy link
Copy Markdown
Contributor

Why I did it

On a supervisor card in a chassis, syncd/teamd/swss/lldp etc dockers are created for each Switch Fabric card. However, not all chassis would have all the switch fabric cards present. In this case, only dockers for Switch Fabrics present would be created.

system-health indicates errors in this scenario as it is expecting dockers for all Switch Fabrics (based on NUM_ASIC defined in asic.conf file).

system-health process error messages were also altered to indicate which container had the issue; multiple containers may run processes with the same name, which can result in identical system-health error messages, causing ambiguity.

How I did it

Cherry-pick #13497 into 202205 and resolve conflicts

How to verify it

Bringup Supervisor card with one or more missing fabric cards. Execute 'show system-health summary'. The command should not report failure due to missing dockers for the asics on the fabric cards which are not present.

…onic-net#13497)

Why I did it
On a supervisor card in a chassis, syncd/teamd/swss/lldp etc dockers are created for each Switch Fabric card. However, not all chassis would have all the switch fabric cards present. In this case, only dockers for Switch Fabrics present would be created.

system-health indicates errors in this scenario as it is expecting dockers for all Switch Fabrics (based on NUM_ASIC defined in asic.conf file).

system-health process error messages were also altered to indicate which container had the issue; multiple containers may run processes with the same name, which can result in identical system-health error messages, causing ambiguity.

How I did it
Port container_checker logic from sonic-net#11442 into service_checker for system-health.

How to verify it
Bringup Supervisor card with one or more missing fabric cards. Execute 'show system-health summary'. The command should not report failure due to missing dockers for the asics on the fabric cards which are not present.
@spilkey-cisco
Copy link
Copy Markdown
Contributor Author

@gechiang, @abdosi, opened this PR to resolve conflicts from cherry-pick of #13497

@spilkey-cisco
Copy link
Copy Markdown
Contributor Author

Conflict was from #12563 in master.

@yxieca yxieca merged commit b2e124c into sonic-net:202205 Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants