Skip to content

BROADCOM_LEGACY_SAI_COMPAT: Fix sai_get_stats_ext crash on TH1 legacy image#26014

Merged
StormLiangMS merged 3 commits intosonic-net:masterfrom
lipxu:fix/brcm-legacy-compat-buildimage-issue2
Mar 17, 2026
Merged

BROADCOM_LEGACY_SAI_COMPAT: Fix sai_get_stats_ext crash on TH1 legacy image#26014
StormLiangMS merged 3 commits intosonic-net:masterfrom
lipxu:fix/brcm-legacy-compat-buildimage-issue2

Conversation

@lipxu
Copy link
Contributor

@lipxu lipxu commented Mar 11, 2026

Why I did it

On Arista 7060cx (BCM56960/Tomahawk-1) running the broadcom-legacy image, syncd crashes inside sai_get_stats_ext during FlexCounter polling for switch objects. The legacy SAI binary does not support sai_get_stats_ext for switch objects on TH1.

Root cause: sonic-sairedis PR #1775 set use_sai_stats_ext = true for COUNTER_TYPE_SWITCH in FlexCounter::createCounterContext(). This is needed for TH5 but causes a crash on TH1 with the legacy SAI binary.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Add SAI_STATS_EXT_SWITCH_SUPPORTED=0 to sai.profile for all Arista 7060cx HWSKUs (BCM56960/Tomahawk-1). The runtime guard in syncd (sonic-sairedis PR #1789) reads this key and sets use_sai_stats_ext = false for switch counter contexts on this platform.

Files changed:

  • device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S/sai.profile
  • device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-C32/sai.profile
  • device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-D48C8/sai.profile
  • device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-Q24C8/sai.profile
  • device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-T96C8/sai.profile

How to verify it

  1. Build a broadcom-legacy SONiC image for Arista 7060cx
  2. Boot the device and enable FlexCounter for switch objects
  3. Confirm syncd does not crash during FlexCounter polling
  4. Confirm switch counter stats are still collected (via non-ext path)

Which release branch to backport (provide reason below if selected)

These are bug fixes for broadcom-legacy platform (TH1). The crashes are present in 202511.

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Tested branch (Please provide the tested image version)

  • 20251110.15 (broadcom-legacy, Arista 7060cx)

Description for the changelog

BROADCOM_LEGACY_SAI_COMPAT: Add sai.profile key to disable sai_get_stats_ext for switch objects on Arista 7060cx (TH1) to prevent syncd crash during FlexCounter polling on broadcom-legacy image.

Link to config_db schema for YANG module changes

N/A — sai.profile change only, no config_db schema impact.

A picture of a cute animal (not mandatory but encouraged)

🐧

Copilot AI review requested due to automatic review settings March 11, 2026 03:03
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Arista 7060CX (Tomahawk-1 / broadcom-legacy) HWSKU sai.profile files to add runtime feature flags that allow syncd/FlexCounter to avoid calling unsupported legacy-SAI stats APIs which can crash syncd.

Changes:

  • Add SAI_STATS_EXT_SWITCH_SUPPORTED=0 to disable sai_get_stats_ext usage for switch-object counters on TH1 legacy SAI.
  • Add SAI_STATS_ST_CAPABILITY_SUPPORTED=0 to disable streaming-telemetry stats capability querying on TH1 legacy SAI.
  • Add inline comments documenting the platform limitations and why the flags are set.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S/sai.profile Add TH1 legacy runtime-disable flags for stats-ext (switch) and ST capability
device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-T96C8/sai.profile Same flags for T96C8 variant
device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-Q24C8/sai.profile Same flags for Q24C8 variant
device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-D48C8/sai.profile Same flags for D48C8 variant
device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-C32/sai.profile Same flags for C32 variant

Comment on lines +6 to +8
# BROADCOM_LEGACY_SAI_COMPAT: sai_get_stats_ext is not supported for switch objects on TH1 (BCM56960).
# Setting to 0 disables use_sai_stats_ext in FlexCounter for COUNTER_TYPE_SWITCH.
SAI_STATS_EXT_SWITCH_SUPPORTED=0
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description says this key is added for “all Arista 7060cx HWSKUs”, but device/arista/x86_64-arista_7060_cx32s/Arista-7060CX-32S-Q32/ uses sai.profile.j2 and currently doesn’t include SAI_STATS_EXT_SWITCH_SUPPORTED (or SAI_STATS_ST_CAPABILITY_SUPPORTED). If Q32 is also affected, please update its template as well; otherwise, consider clarifying in the PR description which SKUs are intentionally excluded.

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +5
# BROADCOM_LEGACY_SAI_COMPAT: TH1 (BCM56960) has no streaming telemetry platform driver;
# sai_query_stats_st_capability crashes in brcm_sai_st_pd_ctr_cap_list_get.
SAI_STATS_ST_CAPABILITY_SUPPORTED=0

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description focuses on disabling sai_get_stats_ext for switch objects, but this change also introduces SAI_STATS_ST_CAPABILITY_SUPPORTED=0 (streaming telemetry stats capability) in the same profile. Please update the PR description (and/or title) to explicitly mention this additional behavior, or split it into a separate PR if it’s intended to land independently.

Copilot uses AI. Check for mistakes.
Gfrom2016
Gfrom2016 previously approved these changes Mar 13, 2026
Copy link
Contributor

@Gfrom2016 Gfrom2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@StormLiangMS StormLiangMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅ — Companion sai.profile change for sonic-sairedis#1789.

Positives:

  • Both SAI_STATS_ST_CAPABILITY_SUPPORTED=0 and SAI_STATS_EXT_SWITCH_SUPPORTED=0 added to all 5 static sai.profile files
  • Good inline comments distinguishing the two keys and their purposes
  • Consistent with the stacked PR approach (built on top of #26013)

Items to address:

  1. Q32 HWSKU — Same concern as #26013: Arista-7060CX-32S-Q32/sai.profile.j2 is not updated. If Q32 is TH1/BCM56960, both crashes remain possible on that HWSKU.
  2. PR description/title mismatch — Title says "Fix sai_get_stats_ext crash" but the diff also includes SAI_STATS_ST_CAPABILITY_SUPPORTED=0 (from the #26013 stack). Consider clarifying in the description that this PR is stacked on #26013 and includes both keys, to avoid reviewer confusion.

Merge order: #26013 first → rebase #26014, and both after the sairedis companions (#1788, #1789) are merged.

… image

Add SAI_STATS_EXT_SWITCH_SUPPORTED=0 to sai.profile for Arista 7060cx
(BCM56960/Tomahawk-1) to disable sai_get_stats_ext for switch objects.
The legacy SAI binary crashes when FlexCounter calls sai_get_stats_ext
on switch objects during polling.

The runtime guard is implemented in sonic-sairedis PR sonic-net#1789.

Signed-off-by: Liping Xu <[email protected]>
@lipxu lipxu force-pushed the fix/brcm-legacy-compat-buildimage-issue2 branch from 1d56eec to ba466e0 Compare March 16, 2026 12:13
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

… for TH1

Arista-7060CX-32S-Q32 uses a Jinja2 template (sai.profile.j2) rather than
a static sai.profile. Add SAI_STATS_ST_CAPABILITY_SUPPORTED=0 and
SAI_STATS_EXT_SWITCH_SUPPORTED=0 to cover Q32 as well as the other
Arista-7060CX-32S HWSKUs.

Signed-off-by: Liping Xu <[email protected]>
Copilot AI review requested due to automatic review settings March 16, 2026 12:22
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment on lines +3 to +7
{%- set switch_role = DEVICE_METADATA["localhost"]["type"] -%}
{%- if "torrouter" in switch_role.lower() %}
{% set sai_profile_contents = "SAI_INIT_CONFIG_FILE=/usr/share/sonic/hwsku/th-a7060-cx32s-32x40G-t0.config.bcm" -%}
{%- else %}
{%- set sai_profile_contents = 'SAI_INIT_CONFIG_FILE=/usr/share/sonic/hwsku/th-a7060-cx32s-32x40G-t1.config.bcm' -%}
{%- set sai_profile_contents = "SAI_INIT_CONFIG_FILE=/usr/share/sonic/hwsku/th-a7060-cx32s-32x40G-t1.config.bcm" -%}
Comment on lines +1 to +4
{# Get sai.profile based on switch_role #}
{%- if DEVICE_METADATA is defined -%}
{%- set switch_role = DEVICE_METADATA['localhost']['type'] -%}
{%- if 'torrouter' in switch_role.lower() %}
{% set sai_profile_contents = 'SAI_INIT_CONFIG_FILE=/usr/share/sonic/hwsku/th-a7060-cx32s-32x40G-t0.config.bcm' -%}
{%- set switch_role = DEVICE_METADATA["localhost"]["type"] -%}
{%- if "torrouter" in switch_role.lower() %}
…uotes, keep SAI keys

Restore original Jinja2 single-quote style (changed unintentionally in the
previous commit). Only intended change is adding SAI_STATS_ST_CAPABILITY_SUPPORTED=0
and SAI_STATS_EXT_SWITCH_SUPPORTED=0 for TH1/BCM56960.

Signed-off-by: Liping Xu <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@Gfrom2016 Gfrom2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@StormLiangMS StormLiangMS merged commit 7fdd9a4 into sonic-net:master Mar 17, 2026
20 checks passed
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202511: #26217

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants