Skip to content

BROADCOM_LEGACY_SAI_COMPAT: Allow platforms to disable sai_query_stats_st_capability at runtime#26013

Merged
StormLiangMS merged 1 commit intosonic-net:masterfrom
lipxu:fix/brcm-legacy-compat-buildimage-issue1
Mar 16, 2026
Merged

BROADCOM_LEGACY_SAI_COMPAT: Allow platforms to disable sai_query_stats_st_capability at runtime#26013
StormLiangMS merged 1 commit intosonic-net:masterfrom
lipxu:fix/brcm-legacy-compat-buildimage-issue1

Conversation

@lipxu
Copy link
Contributor

@lipxu lipxu commented Mar 11, 2026

Why I did it

On Arista 7060cx (BCM56960/Tomahawk-1) running the broadcom-legacy image, syncd crashes with SIGSEGV inside brcm_sai_st_pd_ctr_cap_list_get on startup. The legacy SAI binary does not initialize p_pdapi_st->vtable for TH1, so dereferencing it at offset +0x10 causes a segfault.

Root cause: sonic-sairedis commit 4f1d7d99 restored sai_query_stats_st_capability to AC_CHECK_FUNCS. Because the XGS SAI headers are used for all broadcom syncd builds, the symbol is found at compile time and called at runtime on TH1 — where it crashes.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Add SAI_STATS_ST_CAPABILITY_SUPPORTED=0 to sai.profile for all Arista 7060cx HWSKUs (BCM56960/Tomahawk-1). The runtime guard in syncd (sonic-sairedis PR #1788) reads this key during apiInitialize() and nulls the query_stats_st_capability function pointer, preventing the crash.

Files changed:

  • device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S/sai.profile
  • device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-C32/sai.profile
  • device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-D48C8/sai.profile
  • device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-Q24C8/sai.profile
  • device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-T96C8/sai.profile

How to verify it

  1. Build a broadcom-legacy SONiC image for Arista 7060cx
  2. Boot the device — syncd should start without crashing
  3. Confirm show platform summary and show interface status work normally

Which release branch to backport (provide reason below if selected)

These are bug fixes for broadcom-legacy platform (TH1). The crashes are present in 202511.

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Tested branch (Please provide the tested image version)

  • 20251110.15 (broadcom-legacy, Arista 7060cx)

Description for the changelog

BROADCOM_LEGACY_SAI_COMPAT: Add sai.profile key to disable sai_query_stats_st_capability on Arista 7060cx (TH1) to prevent syncd SIGSEGV on broadcom-legacy image.

Link to config_db schema for YANG module changes

N/A — sai.profile change only, no config_db schema impact.

A picture of a cute animal (not mandatory but encouraged)

🐧

…s_st_capability at runtime

Add SAI_STATS_ST_CAPABILITY_SUPPORTED=0 to sai.profile for Arista 7060cx
(BCM56960/Tomahawk-1) to disable sai_query_stats_st_capability at runtime.
This prevents a SIGSEGV in brcm_sai_st_pd_ctr_cap_list_get when running
the legacy SAI binary which does not initialize p_pdapi_st->vtable for TH1.

The runtime guard is implemented in sonic-sairedis PR sonic-net#1788.

Signed-off-by: Liping Xu <[email protected]>
Copilot AI review requested due to automatic review settings March 11, 2026 03:02
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates Arista 7060CX-32S (Tomahawk-1 / BCM56960) platform SAI profiles to disable streaming telemetry stats capability at runtime, avoiding a known syncd crash on broadcom-legacy images when sai_query_stats_st_capability is invoked.

Changes:

  • Add SAI_STATS_ST_CAPABILITY_SUPPORTED=0 to multiple Arista 7060CX-32S HWSKU sai.profile files.
  • Add inline comments documenting the TH1 crash context and why the knob is being set.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S/sai.profile Disable ST stats capability for base 7060CX-32S profile and document rationale.
device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-T96C8/sai.profile Disable ST stats capability for T96C8 HWSKU.
device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-Q24C8/sai.profile Disable ST stats capability for Q24C8 HWSKU.
device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-D48C8/sai.profile Disable ST stats capability for D48C8 HWSKU.
device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-C32/sai.profile Disable ST stats capability for C32 HWSKU.

Comment on lines +2 to +3
# BROADCOM_LEGACY_SAI_COMPAT: TH1 (BCM56960) has no streaming telemetry platform driver;
# sai_query_stats_st_capability crashes in brcm_sai_st_pd_ctr_cap_list_get.
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description/metadata reference a “sonic-sairedis PR #1788” for the syncd-side runtime guard, but the linked/mentioned PR content shown in this PR metadata is about adding sensors.conf and appears unrelated. Please double-check the referenced PR number/link in the description so reviewers can trace the dependent syncd change accurately.

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +4
# BROADCOM_LEGACY_SAI_COMPAT: TH1 (BCM56960) has no streaming telemetry platform driver;
# sai_query_stats_st_capability crashes in brcm_sai_st_pd_ctr_cap_list_get.
SAI_STATS_ST_CAPABILITY_SUPPORTED=0
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change updates several 7060CX-32S HWSKU sai.profile files, but the 7060CX-32S-Q32 variant uses a sai.profile.j2 template (device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-Q32/sai.profile.j2) that still does not set SAI_STATS_ST_CAPABILITY_SUPPORTED=0. If Q32 is also TH1/BCM56960, syncd can still hit the same crash when that HWSKU is used. Consider updating the Q32 template as well (or explicitly document why it’s excluded).

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@Gfrom2016 Gfrom2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@StormLiangMS StormLiangMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅ — Straightforward sai.profile addition for the companion syncd fix (sonic-sairedis#1788).

Positives:

  • All 5 static sai.profile files for Arista 7060CX-32S HWSKUs updated consistently
  • Good inline comments with BROADCOM_LEGACY_SAI_COMPAT tag explaining the crash root cause
  • PR description is well-structured with clear verification steps

One concern (also flagged by the Copilot auto-reviewer):

  • Arista-7060CX-32S-Q32 uses a sai.profile.j2 Jinja2 template and is not updated in this PR. If Q32 is also TH1/BCM56960 (same x86_64-arista_7060_cx32s platform directory suggests it is), syncd will still crash on that HWSKU. Please either update the .j2 template or document why Q32 is intentionally excluded.

No other blocking issues — pending the Q32 clarification.

@StormLiangMS StormLiangMS merged commit 3c72a4d into sonic-net:master Mar 16, 2026
24 checks passed
yue-fred-gao pushed a commit to yue-fred-gao/sonic-buildimage that referenced this pull request Mar 16, 2026
…s_st_capability at runtime (sonic-net#26013)

Add SAI_STATS_ST_CAPABILITY_SUPPORTED=0 to sai.profile for Arista 7060cx
(BCM56960/Tomahawk-1) to disable sai_query_stats_st_capability at runtime.
This prevents a SIGSEGV in brcm_sai_st_pd_ctr_cap_list_get when running
the legacy SAI binary which does not initialize p_pdapi_st->vtable for TH1.

The runtime guard is implemented in sonic-sairedis PR sonic-net#1788.

Signed-off-by: Liping Xu <[email protected]>
@lipxu
Copy link
Contributor Author

lipxu commented Mar 18, 2026

conflict 202511, picked manually #26201

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants