Skip to content

[hft]: Remove IPFIX template size estimation, query actual size from SAI instead#4304

Merged
prsunny merged 12 commits intosonic-net:masterfrom
Pterosaur:fix/hft-ipfix-template-size-estimation
Mar 18, 2026
Merged

[hft]: Remove IPFIX template size estimation, query actual size from SAI instead#4304
prsunny merged 12 commits intosonic-net:masterfrom
Pterosaur:fix/hft-ipfix-template-size-estimation

Conversation

@Pterosaur
Copy link
Copy Markdown
Contributor

What I did

Removed the IPFIX template buffer size estimation logic in HFTelProfile::updateTemplates() and replaced it with the standard SAI two-phase query pattern: first query the required size with count=0 / list=nullptr, then allocate and fetch.

Why I did it

The previous estimation logic used hardcoded constants to predict the IPFIX template buffer size, but could underestimate (e.g., estimated 65535 bytes vs. actual 119352 bytes). This caused the SDK to log ERR-level messages on the first attempt:

ERR syncd#SDK: [SAI_TAM.ERR] mlnx_generate_ipfix_templates: Buffer size is too small to hold IPFIX template [size:65535, required:119352].
ERR syncd#SDK: [SAI_TAM.ERR] mlnx_tam_tel_type_get_ipfix_templates: Failed to generate IPFIX templates.
ERR syncd#SDK: [SAI_TAM.ERR] mlnx_tam_tel_type_attrib_get: Failed to get attribute.
ERR syncd#SDK: [SAI_UTILS.ERR] get_dispatch_attribs_handler: Failed Get #0, IPFIX_TEMPLATES, key:TAM_TEL_TYPE

Although the retry with SAI_STATUS_BUFFER_OVERFLOW worked correctly, these error logs caused test_hft_full_queue_counters to fail in LogAnalyzer teardown.

By always querying the size first (count=0, list=nullptr), we avoid the unreliable estimation entirely and follow the idiomatic SAI pattern for variable-length attributes.

How I verified it

  • test_hft_full_queue_counters passes without LogAnalyzer catching IPFIX template size mismatch errors
  • IPFIX template retrieval works correctly regardless of template size

Details if related

  • A separate request has been filed with NVIDIA SDK team to suppress ERR-level logs when count=0 / list=nullptr is passed, since this is the standard way to query required buffer size. Until that is addressed, this change avoids triggering the error path altogether.

@Pterosaur Pterosaur requested a review from prsunny as a code owner March 6, 2026 12:32
Copilot AI review requested due to automatic review settings March 6, 2026 12:32
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Pterosaur Pterosaur changed the title Remove IPFIX template size estimation, query actual size from SAI instead [hft]: Remove IPFIX template size estimation, query actual size from SAI instead Mar 6, 2026
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Pterosaur Pterosaur force-pushed the fix/hft-ipfix-template-size-estimation branch from c407ead to e775b01 Compare March 6, 2026 12:34
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes hardcoded IPFIX template buffer size estimation logic from HFTelProfile::updateTemplates() and replaces it with the standard SAI two-phase query pattern: first query the required buffer size (count=0, list=nullptr), then allocate and fetch. This eliminates unreliable size estimates that could cause SDK ERR-level log messages and test failures in test_hft_full_queue_counters.

Changes:

  • Removed hardcoded IPFIX template size estimation constants and counter-counting loop
  • Replaced with a two-phase SAI query: first call with count=0/list=nullptr to get required size, then allocate and fetch
  • Updated error handling to throw on the first call if it doesn't return SAI_STATUS_BUFFER_OVERFLOW

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Pterosaur Pterosaur force-pushed the fix/hft-ipfix-template-size-estimation branch from 3eea378 to 1bc792c Compare March 6, 2026 12:41
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@Pterosaur Pterosaur requested a review from Copilot March 6, 2026 12:41
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

…tead

The previous approach estimated the IPFIX template buffer size based on
counter subscription counts, but the estimation could be inaccurate and
result in SAI_STATUS_BUFFER_OVERFLOW errors with noisy error logs from
the SDK layer.

Instead of estimating, pass count=0 and list=nullptr to
get_tam_tel_type_attribute() to query the required buffer size first,
then allocate the exact buffer and fetch the data. This follows the
standard SAI pattern for variable-length attribute queries and eliminates
the unreliable estimation logic.

Signed-off-by: Ze Gan <ganze718@gmail.com>
@Pterosaur Pterosaur force-pushed the fix/hft-ipfix-template-size-estimation branch from 1bc792c to 3822842 Compare March 6, 2026 21:51
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Add unit tests covering all code paths in updateTemplates() to satisfy
the diff-coverage ≥80% requirement:

- BufferOverflow_ThenSuccess: SAI returns BUFFER_OVERFLOW on the size
  query, then SUCCESS on the data fetch (happy path with templates).
- Success_EmptyTemplate: SAI returns SUCCESS with count=0 on the first
  call (no templates available).
- FirstCall_UnexpectedFailure: SAI returns an unexpected error on the
  size query.
- SecondCall_Failure: SAI returns BUFFER_OVERFLOW on the size query but
  fails on the data fetch.
- UnknownOID_Throws: caller passes an OID not registered in the
  profile.

Uses the bufferorch_ut.cpp pattern: copy sai_tam_api_t, override
get_tam_tel_type_attribute with a mock, restore on tear-down.

Signed-off-by: Ze Gan <ganze718@gmail.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Pterosaur Pterosaur requested review from banidoru and Copilot March 10, 2026 02:53
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior review feedback addressed. The two-phase SAI query pattern is correct (accepts both SUCCESS and BUFFER_OVERFLOW), zero-initialization is applied, defensive resize comment added, and test coverage is thorough including the BUFFER_OVERFLOW+count=0 edge case. LGTM.

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iteration 2 looks good. All prior feedback has been addressed: zero-initialized attr, dual SUCCESS/BUFFER_OVERFLOW handling on size query, defensive resize comment, Stub member documentation, and the BUFFER_OVERFLOW+count=0 edge case test. No new issues found. LGTM.

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iteration 2 re-review: All prior feedback has been addressed. The two-phase SAI query pattern (count=0/nullptr probe → allocate → fetch) is correctly implemented, handles both SUCCESS and BUFFER_OVERFLOW on the initial call, and guards against count=0. Test coverage is thorough — happy path, empty templates (both SUCCESS and BUFFER_OVERFLOW variants), first-call failure, second-call failure, and unknown OID. No new issues found. LGTM.

Copy link
Copy Markdown

@banidoru banidoru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All reviewers approved. LGTM.

@prsunny prsunny merged commit db6da66 into sonic-net:master Mar 18, 2026
19 checks passed
@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to 202511:

PriyanshTratiya pushed a commit to PriyanshTratiya/sonic-swss that referenced this pull request Mar 19, 2026
…SAI instead (sonic-net#4304)

What I did

Removed the IPFIX template buffer size estimation logic in HFTelProfile::updateTemplates() and replaced it with the standard SAI two-phase query pattern: first query the required size with count=0 / list=nullptr, then allocate and fetch.

Why I did it

The previous estimation logic used hardcoded constants to predict the IPFIX template buffer size, but could underestimate (e.g., estimated 65535 bytes vs. actual 119352 bytes). This caused the SDK to log ERR-level messages on the first attempt:

ERR syncd#SDK: [SAI_TAM.ERR] mlnx_generate_ipfix_templates: Buffer size is too small to hold IPFIX template [size:65535, required:119352].
ERR syncd#SDK: [SAI_TAM.ERR] mlnx_tam_tel_type_get_ipfix_templates: Failed to generate IPFIX templates.
ERR syncd#SDK: [SAI_TAM.ERR] mlnx_tam_tel_type_attrib_get: Failed to get attribute.
ERR syncd#SDK: [SAI_UTILS.ERR] get_dispatch_attribs_handler: Failed Get #0, IPFIX_TEMPLATES, key:TAM_TEL_TYPE
Although the retry with SAI_STATUS_BUFFER_OVERFLOW worked correctly, these error logs caused test_hft_full_queue_counters to fail in LogAnalyzer teardown.

By always querying the size first (count=0, list=nullptr), we avoid the unreliable estimation entirely and follow the idiomatic SAI pattern for variable-length attributes.

Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
vmittal-msft pushed a commit that referenced this pull request Mar 19, 2026
…SAI instead (#4304) (#4369)

What I did

Removed the IPFIX template buffer size estimation logic in HFTelProfile::updateTemplates() and replaced it with the standard SAI two-phase query pattern: first query the required size with count=0 / list=nullptr, then allocate and fetch.

Why I did it

The previous estimation logic used hardcoded constants to predict the IPFIX template buffer size, but could underestimate (e.g., estimated 65535 bytes vs. actual 119352 bytes). This caused the SDK to log ERR-level messages on the first attempt:

ERR syncd#SDK: [SAI_TAM.ERR] mlnx_generate_ipfix_templates: Buffer size is too small to hold IPFIX template [size:65535, required:119352].
ERR syncd#SDK: [SAI_TAM.ERR] mlnx_tam_tel_type_get_ipfix_templates: Failed to generate IPFIX templates.
ERR syncd#SDK: [SAI_TAM.ERR] mlnx_tam_tel_type_attrib_get: Failed to get attribute.
ERR syncd#SDK: [SAI_UTILS.ERR] get_dispatch_attribs_handler: Failed Get #0, IPFIX_TEMPLATES, key:TAM_TEL_TYPE
Although the retry with SAI_STATUS_BUFFER_OVERFLOW worked correctly, these error logs caused test_hft_full_queue_counters to fail in LogAnalyzer teardown.

By always querying the size first (count=0, list=nullptr), we avoid the unreliable estimation entirely and follow the idiomatic SAI pattern for variable-length attributes.

Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
Co-authored-by: Ze Gan <ganze718@gmail.com>
andrewsapronov pushed a commit to netbergtw/sonic-swss that referenced this pull request Mar 25, 2026
…SAI instead (sonic-net#4304) (sonic-net#4369)

What I did

Removed the IPFIX template buffer size estimation logic in HFTelProfile::updateTemplates() and replaced it with the standard SAI two-phase query pattern: first query the required size with count=0 / list=nullptr, then allocate and fetch.

Why I did it

The previous estimation logic used hardcoded constants to predict the IPFIX template buffer size, but could underestimate (e.g., estimated 65535 bytes vs. actual 119352 bytes). This caused the SDK to log ERR-level messages on the first attempt:

ERR syncd#SDK: [SAI_TAM.ERR] mlnx_generate_ipfix_templates: Buffer size is too small to hold IPFIX template [size:65535, required:119352].
ERR syncd#SDK: [SAI_TAM.ERR] mlnx_tam_tel_type_get_ipfix_templates: Failed to generate IPFIX templates.
ERR syncd#SDK: [SAI_TAM.ERR] mlnx_tam_tel_type_attrib_get: Failed to get attribute.
ERR syncd#SDK: [SAI_UTILS.ERR] get_dispatch_attribs_handler: Failed Get #0, IPFIX_TEMPLATES, key:TAM_TEL_TYPE
Although the retry with SAI_STATUS_BUFFER_OVERFLOW worked correctly, these error logs caused test_hft_full_queue_counters to fail in LogAnalyzer teardown.

By always querying the size first (count=0, list=nullptr), we avoid the unreliable estimation entirely and follow the idiomatic SAI pattern for variable-length attributes.

Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
Co-authored-by: Ze Gan <ganze718@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants