Fix flow counter out-of-order issue by notifying counter operations using SelectableChannel#13
Closed
Fix flow counter out-of-order issue by notifying counter operations using SelectableChannel#13
Conversation
ae6d07a to
99a8399
Compare
99a8399 to
405e7df
Compare
Owner
Author
|
ci 3625 passed |
4454521 to
e989ac6
Compare
Owner
Author
|
ci 3642 passed |
stephenxs
commented
Mar 11, 2024
Owner
Author
|
ci 3689 passed |
keboliu
approved these changes
Mar 13, 2024
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
e7b3dc1 to
07aa1cd
Compare
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Owner
Author
|
ci 3742 passed |
stephenxs
pushed a commit
that referenced
this pull request
Feb 27, 2025
…#13) ```<br>* 311efa82 - (HEAD -> 202412) Merge branch '202411' of https://github.com/sonic-net/sonic-sairedis into 202412 (2025-02-18) [Sonic Automation] * 7ae00e5 - (origin/202411) Define bulk chunk size and bulk chunk size per counter ID (sonic-net#1528) (2025-02-11) [mssonicbld] * f35e743 - [nvidia] Skip SAI discovery on ports (sonic-net#1524) (2025-02-07) [mssonicbld] * bf049ed - Use sonictest pool instead of sonic-common and fix arm64 issue. (sonic-net#1516) (2025-02-05) [mssonicbld] * ffe371d - [syncd] Support bulk set in INIT_VIEW mode (sonic-net#1517) (2025-02-05) [mssonicbld]<br>```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What I did
Fix flow counter out-of-order issue by notifying counter operations using SelectableChannel
Signed-off-by: Stephen Sun stephens@nvidia.com
Why I did it
Currently, the operations of SAI objects and their counters (if any) are triggered by different channels, which introduces racing conditions:
SelectableChannel,FLEX_COUNTERandFLEX_COUNTER_GROUPtables in theFLEX_COUNTER_DBsyncdcan receive events in a wrong order, eg. it receives destroying an object first and then stopping counter polling on the object, it can poll counter for a non-exist object, which causes errors in vendor SAI.The new solution is to extend SAI redis attributes on the SAI_SWITCH_OBJECT to notify counter polling. As a result, all the objects and their counters are notified using a unified channel, which is the
SelectableChannel.How I verified it
Unit test
Manual test
Regressions test
Details if related
There are two SAI Redis attributes introduced as below. There are some fields with
const char *type for each attribute. Passing a field asnullptrmeans not to change it.SAI_REDIS_SWITCH_ATTR_FLEX_COUNTER_GROUPfor counters represented byFLEX_COUNTER_GROUPtable in theFLEX_COUNTER_DB, including the following fieldscounter_group_name, which is the key of the table, representing the group name.poll_interval, which is the fieldPOLL_INTERVALof an entry, representing the polling interval of the group.operation, which is the fieldFLEX_COUNTER_STATUSof an entry, representing whether the counter polling is enabled for the groupstats_mode, which is the fieldSTATS_MODEof an entry, eitherSTATS_MODE_READorSTATS_MODE_READ_AND_CLEARplugins, which represents the Lua plugin related to the groupplugin_name, which is the name of the plugins field. It differs among different groupsSAI_REDIS_SWITCH_ATTR_FLEX_COUNTERfor counter groups represented by theFLEX_COUNTERtable in theFLEX_COUNTER_DB, including the following fieldscounter_key, which is the key of the table, with the name convention of<group-name>:oid:<oid-value>counter_ids, which is a list of counter IDs to be polled for the objectcounter_field_name, which is the name of the counter ID field. It differs among different groupsstats_mode, which is the fieldSTATS_MODEof an entry, eitherSTATS_MODE_READorSTATS_MODE_READ_AND_CLEARBoth SAI attributes are terminated by the
RedisRemoteSaiInterfaceobject in the swss context, which serializes the SAI API call into the selectable channel.REDIS_FLEX_COUNTER_COMMAND_COUNTER_GROUP: represents theSEToperation in theFLEX_COUNTER_GROUPtableREDIS_FLEX_COUNTER_COMMAND_START_POLL: represents theSEToperation in theFLEX_COUNTERtableREDIS_FLEX_COUNTER_COMMAND_STOP_POLL: represents theDELoperation in theFLEX_COUNTERtableThe Syncd will call flex counter functions to handle them on receiving the above-extended commands (representing both SAI extended attributes).
Gearbox flex counter database
Pass the Phy OID, an OID of a SAI switch object in syntax, when calling the SAI set API to set the extended attributes. By doing so, the SAI redis objects can choose in which context the SAI API call should be invoked and the corresponding gearbox syncd docker container will handle it.
(ps: THE ORIGINAL GEARBOX FLEX COUNTER IMPLEMENTATION IS BUGGY)
Context and critical section analysis
It does not change the critical section hierarchy
Performance analysis
The counter operations are handled in the same thread in both the new and old solutions.
In swss, the counter operation was asynchronous in the old solution and is synchronous now, which can introduce a bit more latency. However, as the number of counter operations is small, no performance degradation is observed.