hftelorch: minor improvements and cleanups for high frequency telemetry#4315
hftelorch: minor improvements and cleanups for high frequency telemetry#4315Pterosaur wants to merge 10 commits intosonic-net:masterfrom
Conversation
Remove the unused CONSTANTS_FILE macro with a typo in path. Add handleSaiCreateStatus() calls to createNetlinkChannel() for create_hostif, create_hostif_user_defined_trap, and create_hostif_table_entry. Previously these SAI calls did not check return values, which could lead to using invalid object IDs if creation failed. Also remove commented-out trap group code. Signed-off-by: Ze Gan <[email protected]>
The SAI_TAM_REPORT_ATTR_REPORT_INTERVAL_UNIT attribute was set but never pushed to the attrs vector, so it was not passed to the SAI create call. While the SAI default happens to be USEC, this should be explicitly set for correctness. Signed-off-by: Ze Gan <[email protected]>
clearGroup() used operator[] to access m_sai_tam_tel_type_objs which inserts a default entry when the key doesn't exist, potentially polluting the map. Use find() instead and only erase the state entry when the key is actually present. Signed-off-by: Ze Gan <[email protected]>
Replace the union of SetPayload/DelPayload (containing raw const char* pointers) with a std::string member that owns the counter name. This prevents potential dangling pointer issues if the caller's local string goes out of scope before the message is fully processed. Update locallyNotify() in HFTelOrch to use the new Message fields. Signed-off-by: Ze Gan <[email protected]>
Change updateStatsIDs() parameter from const reference to rvalue reference to enable true move semantics. The previous code used std::move on a const reference which silently fell back to copy. Update callers in HFTelProfile::setStatsIDs() to std::move the local set into the call. Signed-off-by: Ze Gan <[email protected]>
0e0a618 to
b67cef0
Compare
|
/azp run |
1 similar comment
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
|
Azure Pipelines successfully started running 1 pipeline(s). |
SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_ID is defined as sai_uint32_t in the SAI metadata. Use attr.value.u32 instead of attr.value.oid when creating TAM counter subscription objects. Signed-off-by: Ze Gan <[email protected]>
Adjust clearGroup() cleanup order to erase TAM counter subscription objects before removing the TAM telemetry type and report objects. This better matches the object dependency order during teardown. Signed-off-by: Ze Gan <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
banidoru
left a comment
There was a problem hiding this comment.
Clean PR with several meaningful bug fixes. No blocking issues found.
Key fixes verified:
- Dangling pointer fix in Message struct (union of
const char*→std::string) - Missing
attrs.push_back(attr)for report interval unit attribute - Type mismatch fix (
oid→u32) for stat_id - Safe lookup in
clearGrouppreventing default-insertion viaoperator[] - Added
handleSaiCreateStatuserror checking on previously unchecked SAI calls - Removed dead code and unused macro with typo (
/et/sonic/)
LGTM.
banidoru
left a comment
There was a problem hiding this comment.
Clean set of fixes — the dangling-pointer fix, missing push_back, stat_id type correction, and defensive clearGroup are all solid. A few observations below, none blocking.
banidoru
left a comment
There was a problem hiding this comment.
Clean improvements — good bug fixes (missing push_back, oid→u32 type mismatch, clearGroup defensive lookup) and solid safety improvement replacing the union with std::string. One concern about partial-creation cleanup in createNetlinkChannel; otherwise LGTM.
There was a problem hiding this comment.
Pull request overview
Updates the orchagent high-frequency telemetry (HFT) implementation with small robustness, correctness, and cleanup improvements across profile/group management, SAI TAM object creation, and counter-name update plumbing.
Changes:
- Improve TAM/hostif creation and report configuration correctness (add missing report-interval-unit attribute; use
handleSaiCreateStatus()for hostif creates). - Make group/profile bookkeeping safer (avoid
operator[]insertion inclearGroup(), adjust cleanup ordering; enable actual move semantics for stats ID updates). - Harden counter-name update messaging by owning the counter name (
std::string) instead of using non-owningconst char*payloads.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| orchagent/high_frequency_telemetry/hftelprofile.cpp | Moves stats ID sets into groups, fixes clearGroup() map handling/cleanup order, and pushes missing report interval unit attribute; adjusts counter subscription stat-id attribute type. |
| orchagent/high_frequency_telemetry/hftelorch.cpp | Uses handleSaiCreateStatus() for hostif object creations and updates local notification handling to use the new message shape. |
| orchagent/high_frequency_telemetry/hftelgroup.h | Changes updateStatsIDs() signature to accept an rvalue set to enable move semantics. |
| orchagent/high_frequency_telemetry/hftelgroup.cpp | Implements moved updateStatsIDs() and uses std::move into the member set. |
| orchagent/high_frequency_telemetry/counternameupdater.h | Replaces union payload with owned std::string counter name + m_oid field. |
| orchagent/high_frequency_telemetry/counternameupdater.cpp | Updates message construction accordingly and removes dependence on temporary-string c_str() lifetimes. |
You can also share your feedback on Copilot code review. Take the survey.
Signed-off-by: Ze Gan <[email protected]>
banidoru
left a comment
There was a problem hiding this comment.
All fixes look correct and well-structured. Key changes verified:
- Dangling pointer fix: Message struct now owns data via
std::stringfor bothm_table_nameandm_counter_name. Union eliminated. - Missing push_back:
SAI_TAM_REPORT_ATTR_REPORT_INTERVAL_UNITnow correctly added to attrs vector. - Type mismatch:
SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_IDcorrectly usesu32instead ofoid. - clearGroup safety:
find()replacesoperator[]preventing spurious map insertions. - Partial failure cleanup:
createNetlinkChannel()now checks each SAI create and callsdeleteNetlinkChannel()on failure. - Move semantics:
updateStatsIDs()rvalue-ref parameter enables true move instead of silent copy. - Dead code removal: Unused
CONSTANTS_FILEmacro (with typo) and commented-out code removed.
No new issues found.
banidoru
left a comment
There was a problem hiding this comment.
Clean set of bug fixes for the high-frequency telemetry subsystem. The latest iteration addresses all prior feedback well.
Key fixes verified:
Messagestruct now owns its strings viastd::string, eliminating dangling-pointer hazards from the oldconst char*/union design- Missing
attrs.push_back(attr)forSAI_TAM_REPORT_ATTR_REPORT_INTERVAL_UNIT— silent data loss bug attr.value.oid→attr.value.u32forSAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_ID— type mismatch fixclearGroupdefensive find-before-erase prevents accidental default-insertion into the mapcreateNetlinkChannelnow checks return status and callsdeleteNetlinkChannel()on partial failure- Dead code removal (
CONSTANTS_FILEmacro with typo, commented-out trap group code) - Move semantics for
updateStatsIDs— correct and all call sites usestd::move
No new issues found. LGTM.
banidoru
left a comment
There was a problem hiding this comment.
Solid bug-fix PR. All changes are correct and well-targeted:
- Dangling pointer fix (Message struct): Replacing raw
const char*+ union with owningstd::stringmembers eliminates a real use-after-scope hazard. Default-initializingm_oidtoSAI_NULL_OBJECT_IDis clean. - Missing
attrs.push_backfor report interval unit: Clear bug — the attribute was configured but never added to the vector. stat_idtype mismatch (oid→u32): Correct per SAI spec;SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_IDis a u32 enum, not an OID.clearGroupdefensive lookup: Using find-then-erase instead ofoperator[]prevents silent default-insertion of a zero OID key.createNetlinkChannelerror handling: CheckinghandleSaiCreateStatusreturn and callingdeleteNetlinkChannel()on partial failure prevents SAI object leaks.- Move semantics for
updateStatsIDs: Appropriate since all call sites build a temporary set. - Dead code removal: Typo-laden
CONSTANTS_FILEmacro and commented-out trap group code cleaned up.
LGTM.
banidoru
left a comment
There was a problem hiding this comment.
Solid bugfix PR. All issues from prior review rounds have been addressed:
- Dangling pointer fix: Message struct now owns data via std::string for both m_table_name and m_counter_name. m_oid properly default-initialized.
- Missing attrs.push_back: SAI_TAM_REPORT_ATTR_REPORT_INTERVAL_UNIT was silently dropped — now correctly pushed.
- Type mismatch: SAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_ID correctly uses u32 instead of oid.
- clearGroup defensive lookup: operator[] replaced with find() to avoid spurious default-insertion.
- createNetlinkChannel error handling: SAI create calls now checked with handleSaiCreateStatus and rollback via deleteNetlinkChannel on partial failure.
- Move semantics: updateStatsIDs correctly takes rvalue ref, callers use std::move.
- Dead code cleanup: Removed unused CONSTANTS_FILE macro and commented-out code.
No new issues found. LGTM.
banidoru
left a comment
There was a problem hiding this comment.
All fixes look correct. The dangling-pointer hazard (union of raw const char* → owning std::string members) is properly resolved, m_oid has a safe default initializer, the missing attrs.push_back for the report interval unit is fixed, the stat_id type mismatch (oid → u32) is corrected, the defensive find-before-erase in clearGroup prevents accidental default-insertion, and createNetlinkChannel now properly checks SAI return status with rollback via deleteNetlinkChannel() on partial failure. Clean set of bug fixes.
banidoru
left a comment
There was a problem hiding this comment.
Solid set of bug fixes. The union-to-struct migration eliminates dangling pointer hazards, the missing attrs.push_back for report interval unit was a real bug, the u32 vs oid type correction aligns with SAI spec, the defensive find before erase in clearGroup prevents silent default-insertion, and the handleSaiCreateStatus additions with rollback via deleteNetlinkChannel improve error handling. Previous review concerns (default m_oid init, m_table_name ownership) have been addressed in c470f11. LGTM.
|
/azp run |
1 similar comment
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@Pterosaur please help with branch rebase and passing PR checkers |
What I did
A collection of minor improvements and cleanups for the high frequency telemetry orchagent code:
handleSaiCreateStatus()calls increateNetlinkChannel()for consistency withcreateTAM().attrs.push_back()forSAI_TAM_REPORT_ATTR_REPORT_INTERVAL_UNIT.find()instead ofoperator[]inclearGroup()to avoid inserting default entries.clearGroup()cleanup order to delete counter subscriptions before telemetry type/report objects.const char*fields inCounterNameMapUpdater::Messagewith ownedstd::stringstate for robustness.CounterNameMapUpdater::Message::m_oidtoSAI_NULL_OBJECT_IDand keep it SET-only by contract.updateStatsIDs()to enable true move semantics.u32forSAI_TAM_COUNTER_SUBSCRIPTION_ATTR_STAT_IDto match the SAI metadata type.createNetlinkChannel()if a later HOSTIF create step fails.CONSTANTS_FILEmacro and clean up commented-out code.Why I did it
Identified during code review. Most of these are defensive robustness/maintainability fixes. The
createNetlinkChannel()follow-up also improves failure-path behavior by avoiding leaked partially created HOSTIF objects when later create steps fail.How I verified it
Code review only. CI will validate build and unit tests.
Details if related
N/A