[syncd][PFC WD]: Add support for plugin subscription#229
[syncd][PFC WD]: Add support for plugin subscription#229lguohan merged 5 commits intosonic-net:masterfrom
Conversation
Still under development. Must be merged after corresponding changes in swss will be available |
|
conflict? |
lib/src/sai_redis_notifications.cpp
Outdated
| sai_fdb_event_notification_fn on_fdb_event = NULL; | ||
| sai_port_state_change_notification_fn on_port_state_change = NULL; | ||
| sai_packet_event_notification_fn on_packet_event = NULL; | ||
| sai_queue_pfc_deadlock_notification_fn on_queue_deadlock = NULL; |
There was a problem hiding this comment.
what if the real sai supports this notification natively?
There was a problem hiding this comment.
Orchagent subscribes to sairedis notification for this callback like to any other SAI notification. The difference is that if SAI supports native notifications, then it will trigger event, and if we have .lua plugins for the platform, then syncd will run them and trigger same callback itself.
syncd/pfc_detect_mlnx.lua
Outdated
|
|
||
| local m = table.getn(counter_keys) | ||
| for j = m, 1, -1 do | ||
| if string.sub(counter_keys[j],-string.len('RX_PKTS'))=='RX_PKTS' then |
There was a problem hiding this comment.
string.sub(counter_k [](start = 15, length = 20)
change to the exact match?
There was a problem hiding this comment.
I don't know the exact name of stat ID, becasuse it is something like SAI_PORT_STAT_PFC_[i]_RX_PKTS
|
if we let syncd to collect counter and then run the storm detection and then generate the notification, we do not really need to put the data into redis db, and use lua script to analyze the data. the reason we have orchagent and syncd is to split the functionality, making syncd to only do simple jobs and most of bussiness logic in orchagent. In the original design, the syncd is only pulling the counters and orchagent is handle all to detection logic. I think it should be still kept in this way. after syncd pull the counters every 100ms, it should call the lua script (the lua script should be installed into db by swss process, not the syncd process). the lua script itself can generate notification to swss to handle, not via sai notification. Here the only thing syncd needs to do is to pull the counters and then call the lua script. This will be easier for us the change the pfc storm detection algorithm if there is bug. In that case, we can remove the lua script (syncd will stop calling lua script since it does not exist), and install a new lua script. The orchagent needs to be restart, but not the syncd. |
|
So basically we have 3 steps of execution here:
Step 1 must be done in syncd. Step 2 now is implemented as lua script called by syncd. Step 3 is now done by syncd. If we want to change detection/restoration algorithm, we still can do it by changing lua without restarting any daemon. Why is it necessary to implement notifications inside lua scripts if I can reuse generic way of notifying orchagent? About installation of lua script, I can keep it in swss, but it will require passing its SHA to syncd via Redis DB. If you're ok with it, I'll do it. |
This commit implements native support for SAI_SWITCH_ATTRIBUTE_QUEUE_PFC_DEADLOCK_NOTIFY. Calling lua scripts is moved from orchagent to syncd, which uses native notification by a SW watchdog as well. Signed-off-by: marian-pritsak <marianp@mellanox.com>
Signed-off-by: marian-pritsak <marianp@mellanox.com>
|
Depends on sonic-net/sonic-swss-common#134 |
|
retest this please |
syncd/Makefile.am
Outdated
|
|
||
| dist_swss_DATA = \ | ||
| pfc_detect_mlnx.lua \ | ||
| pfc_restore_mlnx.lua |
There was a problem hiding this comment.
do we still need these two files in syncd?
|
can you separate native support for the code is completely in a different folder. |
|
Sure, I will |
syncd/syncd.cpp
Outdated
| } | ||
| else | ||
| { | ||
| SWSS_LOG_ERROR("Object tyoe for not supported"); |
| const auto field = fvField(valuePair); | ||
| const auto value = fvValue(valuePair); | ||
|
|
||
| if (field != SAI_OBJECT_TYPE) |
There was a problem hiding this comment.
you do not need to reuse SAI_OBJECT_TYPE, this counter plugin has nothing to do with SAI. You can define your own field and value, like "type:queue", "type:port"
There was a problem hiding this comment.
I will change in a follow-up PR, since it requires change in swss-common schema
lguohan
left a comment
There was a problem hiding this comment.
some minor changes, overall the code looks good to me.
|
retest this please |
|
build failure |
|
I need to remove .lua scripts from Makefile.am |
Signed-off-by: marian-pritsak <marianp@mellanox.com>
|
why you need to remove lua scripts ? |
|
They are part of swss |
|
HW part moved to |
Signed-off-by: marian-pritsak <marianp@mellanox.com>
| { | ||
| collectCounters(countersTable); | ||
| runPlugins(db); | ||
|
|
There was a problem hiding this comment.
runPlugins and main syncd thread both access m_portCounterIdsMap and m_queueCounterIdsMap. Add lock?
There was a problem hiding this comment.
can you confirm? they are protected by g_mutex lock.
There was a problem hiding this comment.
runPlugins doesn't acquire the g_mutex lock.
|
|
||
| for (const auto& sha : m_portPlugins) | ||
| { | ||
| runRedisScript(db, sha, portList, argv); |
There was a problem hiding this comment.
I think we do not use them, but does not hurt to have them.
There was a problem hiding this comment.
Yes, currently no one uses port plugins
|
|
||
| for (const auto& sha : m_queuePlugins) | ||
| { | ||
| runRedisScript(db, sha, queueList, argv); |
There was a problem hiding this comment.
Where is the result of redis script being used?
There was a problem hiding this comment.
syncd doesn't care about a result of a script. It only executes them to make changes in DB or to notify other modules.
| static void removeQueue( | ||
| _In_ sai_object_id_t queueVid); | ||
|
|
||
| static void addPortCounterPlugin( |
There was a problem hiding this comment.
Are we using addPortCounterPlugin?
There was a problem hiding this comment.
no, we are not, but It seems good to have it for future use.
There was a problem hiding this comment.
We don't use it now
|
|
||
| static void addPortCounterPlugin( | ||
| _In_ std::string sha); | ||
| static void addQueueCounterPlugin( |
There was a problem hiding this comment.
The name of the function is a little misleading. The plugin is not only for queue counters, it might also query port counters etc. Consider change the name to addPlugin, or addQueuePlugin, addPortPlugin if we need both
There was a problem hiding this comment.
Type of plugin means what objects will be passed as arguments for execution. Of course, a client can register port counters for polling as well, but the plugin will only receive queue counters to do its logic.
|
|
||
| PfcWatchdog &wd = getInstance(); | ||
|
|
||
| if (wd.m_portPlugins.find(sha) != wd.m_portPlugins.end() || |
There was a problem hiding this comment.
Why also need to check the m_portPlugins here? If that's the case, why separate them in the first place?
There was a problem hiding this comment.
Client is not allowed to register the exactly same plugin for different types. Later plugin will have no idea which objects were passed to it as arguments.
syncd/syncd.cpp
Outdated
| auto idStrings = swss::tokenize(value, ','); | ||
|
|
||
| if (field == PFC_WD_PORT_COUNTER_ID_LIST) | ||
| if (objectType == SAI_OBJECT_TYPE_PORT && field == PFC_WD_PORT_COUNTER_ID_LIST) |
There was a problem hiding this comment.
Why we need to determine the objectType here? Only field should be enough?
There was a problem hiding this comment.
To make sure that client doesn't accidentally subscribe to the wrong object.
syncd/syncd.cpp
Outdated
| PfcWatchdog::setPortCounterList(vid, rid, portCounterIds); | ||
| } | ||
| else if (field == PFC_WD_QUEUE_COUNTER_ID_LIST) | ||
| else if (objectType == SAI_OBJECT_TYPE_QUEUE && field == PFC_WD_QUEUE_COUNTER_ID_LIST) |
There was a problem hiding this comment.
Why we need to determine the objectType here? Only field should be enough?
There was a problem hiding this comment.
To make sure that client doesn't accidentally subscribe to the wrong object.
|
|
||
| if (value == sai_serialize_object_type(SAI_OBJECT_TYPE_PORT)) | ||
| { | ||
| PfcWatchdog::addPortCounterPlugin(key); |
There was a problem hiding this comment.
Are we going to use portcounterplugin?
There was a problem hiding this comment.
No, at least not for now, because PFC WD works on queue level.
syncd/syncd.cpp
Outdated
| } | ||
| else | ||
| { | ||
| SWSS_LOG_ERROR("Object tyoe for removal not supported"); |
There was a problem hiding this comment.
here you also need continue.
I think it is better to do as follows, and it will be more clear.
if (op == DEL_COMMAND)
{
}
else if (op == SET_COMMAND)
{
}
else
{
}
Signed-off-by: marian-pritsak <marianp@mellanox.com>
Calling lua scripts is moved from orchagent to syncd.
Client registers plugins so that later it can be notified of
certain events by those plugins
Signed-off-by: marian-pritsak marianp@mellanox.com