[hft]: Remove IPFIX template size estimation, query actual size from SAI instead#4304
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
c407ead to
e775b01
Compare
|
/azp run |
There was a problem hiding this comment.
Pull request overview
This PR removes hardcoded IPFIX template buffer size estimation logic from HFTelProfile::updateTemplates() and replaces it with the standard SAI two-phase query pattern: first query the required buffer size (count=0, list=nullptr), then allocate and fetch. This eliminates unreliable size estimates that could cause SDK ERR-level log messages and test failures in test_hft_full_queue_counters.
Changes:
- Removed hardcoded IPFIX template size estimation constants and counter-counting loop
- Replaced with a two-phase SAI query: first call with count=0/list=nullptr to get required size, then allocate and fetch
- Updated error handling to throw on the first call if it doesn't return
SAI_STATUS_BUFFER_OVERFLOW
|
Azure Pipelines successfully started running 1 pipeline(s). |
3eea378 to
1bc792c
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…tead The previous approach estimated the IPFIX template buffer size based on counter subscription counts, but the estimation could be inaccurate and result in SAI_STATUS_BUFFER_OVERFLOW errors with noisy error logs from the SDK layer. Instead of estimating, pass count=0 and list=nullptr to get_tam_tel_type_attribute() to query the required buffer size first, then allocate the exact buffer and fetch the data. This follows the standard SAI pattern for variable-length attribute queries and eliminates the unreliable estimation logic. Signed-off-by: Ze Gan <ganze718@gmail.com>
1bc792c to
3822842
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Add unit tests covering all code paths in updateTemplates() to satisfy the diff-coverage ≥80% requirement: - BufferOverflow_ThenSuccess: SAI returns BUFFER_OVERFLOW on the size query, then SUCCESS on the data fetch (happy path with templates). - Success_EmptyTemplate: SAI returns SUCCESS with count=0 on the first call (no templates available). - FirstCall_UnexpectedFailure: SAI returns an unexpected error on the size query. - SecondCall_Failure: SAI returns BUFFER_OVERFLOW on the size query but fails on the data fetch. - UnknownOID_Throws: caller passes an OID not registered in the profile. Uses the bufferorch_ut.cpp pattern: copy sai_tam_api_t, override get_tam_tel_type_attribute with a mock, restore on tear-down. Signed-off-by: Ze Gan <ganze718@gmail.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
banidoru
left a comment
There was a problem hiding this comment.
All prior review feedback addressed. The two-phase SAI query pattern is correct (accepts both SUCCESS and BUFFER_OVERFLOW), zero-initialization is applied, defensive resize comment added, and test coverage is thorough including the BUFFER_OVERFLOW+count=0 edge case. LGTM.
banidoru
left a comment
There was a problem hiding this comment.
Iteration 2 looks good. All prior feedback has been addressed: zero-initialized attr, dual SUCCESS/BUFFER_OVERFLOW handling on size query, defensive resize comment, Stub member documentation, and the BUFFER_OVERFLOW+count=0 edge case test. No new issues found. LGTM.
banidoru
left a comment
There was a problem hiding this comment.
Iteration 2 re-review: All prior feedback has been addressed. The two-phase SAI query pattern (count=0/nullptr probe → allocate → fetch) is correctly implemented, handles both SUCCESS and BUFFER_OVERFLOW on the initial call, and guards against count=0. Test coverage is thorough — happy path, empty templates (both SUCCESS and BUFFER_OVERFLOW variants), first-call failure, second-call failure, and unknown OID. No new issues found. LGTM.
|
Cherry-pick PR to 202511: |
…SAI instead (sonic-net#4304) What I did Removed the IPFIX template buffer size estimation logic in HFTelProfile::updateTemplates() and replaced it with the standard SAI two-phase query pattern: first query the required size with count=0 / list=nullptr, then allocate and fetch. Why I did it The previous estimation logic used hardcoded constants to predict the IPFIX template buffer size, but could underestimate (e.g., estimated 65535 bytes vs. actual 119352 bytes). This caused the SDK to log ERR-level messages on the first attempt: ERR syncd#SDK: [SAI_TAM.ERR] mlnx_generate_ipfix_templates: Buffer size is too small to hold IPFIX template [size:65535, required:119352]. ERR syncd#SDK: [SAI_TAM.ERR] mlnx_tam_tel_type_get_ipfix_templates: Failed to generate IPFIX templates. ERR syncd#SDK: [SAI_TAM.ERR] mlnx_tam_tel_type_attrib_get: Failed to get attribute. ERR syncd#SDK: [SAI_UTILS.ERR] get_dispatch_attribs_handler: Failed Get #0, IPFIX_TEMPLATES, key:TAM_TEL_TYPE Although the retry with SAI_STATUS_BUFFER_OVERFLOW worked correctly, these error logs caused test_hft_full_queue_counters to fail in LogAnalyzer teardown. By always querying the size first (count=0, list=nullptr), we avoid the unreliable estimation entirely and follow the idiomatic SAI pattern for variable-length attributes. Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
…SAI instead (#4304) (#4369) What I did Removed the IPFIX template buffer size estimation logic in HFTelProfile::updateTemplates() and replaced it with the standard SAI two-phase query pattern: first query the required size with count=0 / list=nullptr, then allocate and fetch. Why I did it The previous estimation logic used hardcoded constants to predict the IPFIX template buffer size, but could underestimate (e.g., estimated 65535 bytes vs. actual 119352 bytes). This caused the SDK to log ERR-level messages on the first attempt: ERR syncd#SDK: [SAI_TAM.ERR] mlnx_generate_ipfix_templates: Buffer size is too small to hold IPFIX template [size:65535, required:119352]. ERR syncd#SDK: [SAI_TAM.ERR] mlnx_tam_tel_type_get_ipfix_templates: Failed to generate IPFIX templates. ERR syncd#SDK: [SAI_TAM.ERR] mlnx_tam_tel_type_attrib_get: Failed to get attribute. ERR syncd#SDK: [SAI_UTILS.ERR] get_dispatch_attribs_handler: Failed Get #0, IPFIX_TEMPLATES, key:TAM_TEL_TYPE Although the retry with SAI_STATUS_BUFFER_OVERFLOW worked correctly, these error logs caused test_hft_full_queue_counters to fail in LogAnalyzer teardown. By always querying the size first (count=0, list=nullptr), we avoid the unreliable estimation entirely and follow the idiomatic SAI pattern for variable-length attributes. Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com> Co-authored-by: Ze Gan <ganze718@gmail.com>
…SAI instead (sonic-net#4304) (sonic-net#4369) What I did Removed the IPFIX template buffer size estimation logic in HFTelProfile::updateTemplates() and replaced it with the standard SAI two-phase query pattern: first query the required size with count=0 / list=nullptr, then allocate and fetch. Why I did it The previous estimation logic used hardcoded constants to predict the IPFIX template buffer size, but could underestimate (e.g., estimated 65535 bytes vs. actual 119352 bytes). This caused the SDK to log ERR-level messages on the first attempt: ERR syncd#SDK: [SAI_TAM.ERR] mlnx_generate_ipfix_templates: Buffer size is too small to hold IPFIX template [size:65535, required:119352]. ERR syncd#SDK: [SAI_TAM.ERR] mlnx_tam_tel_type_get_ipfix_templates: Failed to generate IPFIX templates. ERR syncd#SDK: [SAI_TAM.ERR] mlnx_tam_tel_type_attrib_get: Failed to get attribute. ERR syncd#SDK: [SAI_UTILS.ERR] get_dispatch_attribs_handler: Failed Get #0, IPFIX_TEMPLATES, key:TAM_TEL_TYPE Although the retry with SAI_STATUS_BUFFER_OVERFLOW worked correctly, these error logs caused test_hft_full_queue_counters to fail in LogAnalyzer teardown. By always querying the size first (count=0, list=nullptr), we avoid the unreliable estimation entirely and follow the idiomatic SAI pattern for variable-length attributes. Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com> Co-authored-by: Ze Gan <ganze718@gmail.com>
What I did
Removed the IPFIX template buffer size estimation logic in
HFTelProfile::updateTemplates()and replaced it with the standard SAI two-phase query pattern: first query the required size withcount=0/list=nullptr, then allocate and fetch.Why I did it
The previous estimation logic used hardcoded constants to predict the IPFIX template buffer size, but could underestimate (e.g., estimated 65535 bytes vs. actual 119352 bytes). This caused the SDK to log ERR-level messages on the first attempt:
Although the retry with
SAI_STATUS_BUFFER_OVERFLOWworked correctly, these error logs causedtest_hft_full_queue_countersto fail in LogAnalyzer teardown.By always querying the size first (count=0, list=nullptr), we avoid the unreliable estimation entirely and follow the idiomatic SAI pattern for variable-length attributes.
How I verified it
test_hft_full_queue_counterspasses without LogAnalyzer catching IPFIX template size mismatch errorsDetails if related
count=0/list=nullptris passed, since this is the standard way to query required buffer size. Until that is addressed, this change avoids triggering the error path altogether.