Skip to content

[syncd][PFC WD]: Add support for plugin subscription#229

Merged
lguohan merged 5 commits intosonic-net:masterfrom
marian-pritsak:pfc-wd
Oct 6, 2017
Merged

[syncd][PFC WD]: Add support for plugin subscription#229
lguohan merged 5 commits intosonic-net:masterfrom
marian-pritsak:pfc-wd

Conversation

@marian-pritsak
Copy link
Copy Markdown
Contributor

@marian-pritsak marian-pritsak commented Sep 26, 2017

Calling lua scripts is moved from orchagent to syncd.
Client registers plugins so that later it can be notified of
certain events by those plugins

Signed-off-by: marian-pritsak marianp@mellanox.com

@marian-pritsak
Copy link
Copy Markdown
Contributor Author

Still under development. Must be merged after corresponding changes in swss will be available

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Sep 27, 2017

conflict?

sai_fdb_event_notification_fn on_fdb_event = NULL;
sai_port_state_change_notification_fn on_port_state_change = NULL;
sai_packet_event_notification_fn on_packet_event = NULL;
sai_queue_pfc_deadlock_notification_fn on_queue_deadlock = NULL;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the real sai supports this notification natively?

Copy link
Copy Markdown
Contributor Author

@marian-pritsak marian-pritsak Sep 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orchagent subscribes to sairedis notification for this callback like to any other SAI notification. The difference is that if SAI supports native notifications, then it will trigger event, and if we have .lua plugins for the platform, then syncd will run them and trigger same callback itself.


local m = table.getn(counter_keys)
for j = m, 1, -1 do
if string.sub(counter_keys[j],-string.len('RX_PKTS'))=='RX_PKTS' then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string.sub(counter_k [](start = 15, length = 20)

change to the exact match?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the exact name of stat ID, becasuse it is something like SAI_PORT_STAT_PFC_[i]_RX_PKTS

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Sep 27, 2017

if we let syncd to collect counter and then run the storm detection and then generate the notification, we do not really need to put the data into redis db, and use lua script to analyze the data.

the reason we have orchagent and syncd is to split the functionality, making syncd to only do simple jobs and most of bussiness logic in orchagent.

In the original design, the syncd is only pulling the counters and orchagent is handle all to detection logic. I think it should be still kept in this way.

after syncd pull the counters every 100ms, it should call the lua script (the lua script should be installed into db by swss process, not the syncd process). the lua script itself can generate notification to swss to handle, not via sai notification. Here the only thing syncd needs to do is to pull the counters and then call the lua script.

This will be easier for us the change the pfc storm detection algorithm if there is bug. In that case, we can remove the lua script (syncd will stop calling lua script since it does not exist), and install a new lua script. The orchagent needs to be restart, but not the syncd.

@marian-pritsak
Copy link
Copy Markdown
Contributor Author

So basically we have 3 steps of execution here:

  1. Collect counters from SAI;
  2. Run verification algorithm;
  3. Notify orchagent if something changes;

Step 1 must be done in syncd. Step 2 now is implemented as lua script called by syncd. Step 3 is now done by syncd.

If we want to change detection/restoration algorithm, we still can do it by changing lua without restarting any daemon. Why is it necessary to implement notifications inside lua scripts if I can reuse generic way of notifying orchagent?

About installation of lua script, I can keep it in swss, but it will require passing its SHA to syncd via Redis DB. If you're ok with it, I'll do it.

This commit implements native support for
SAI_SWITCH_ATTRIBUTE_QUEUE_PFC_DEADLOCK_NOTIFY.
Calling lua scripts is moved from orchagent to syncd, which uses native
notification by a SW watchdog as well.

Signed-off-by: marian-pritsak <marianp@mellanox.com>
Copy link
Copy Markdown
Contributor

@lguohan lguohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as comments

Signed-off-by: marian-pritsak <marianp@mellanox.com>
@marian-pritsak
Copy link
Copy Markdown
Contributor Author

Depends on sonic-net/sonic-swss-common#134

@marian-pritsak
Copy link
Copy Markdown
Contributor Author

retest this please


dist_swss_DATA = \
pfc_detect_mlnx.lua \
pfc_restore_mlnx.lua
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need these two files in syncd?

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Oct 5, 2017

can you separate native support for
SAI_SWITCH_ATTRIBUTE_QUEUE_PFC_DEADLOCK_NOTIFY in sairedis.

the code is completely in a different folder.

@marian-pritsak
Copy link
Copy Markdown
Contributor Author

Sure, I will

syncd/syncd.cpp Outdated
}
else
{
SWSS_LOG_ERROR("Object tyoe for not supported");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tyoe -> type

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

const auto field = fvField(valuePair);
const auto value = fvValue(valuePair);

if (field != SAI_OBJECT_TYPE)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you do not need to reuse SAI_OBJECT_TYPE, this counter plugin has nothing to do with SAI. You can define your own field and value, like "type:queue", "type:port"

Copy link
Copy Markdown
Contributor Author

@marian-pritsak marian-pritsak Oct 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change in a follow-up PR, since it requires change in swss-common schema

Copy link
Copy Markdown
Contributor

@lguohan lguohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor changes, overall the code looks good to me.

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Oct 5, 2017

retest this please

@lguohan
Copy link
Copy Markdown
Contributor

lguohan commented Oct 5, 2017

build failure

@marian-pritsak
Copy link
Copy Markdown
Contributor Author

I need to remove .lua scripts from Makefile.am

Signed-off-by: marian-pritsak <marianp@mellanox.com>
@kcudnik
Copy link
Copy Markdown
Collaborator

kcudnik commented Oct 5, 2017

why you need to remove lua scripts ?

@marian-pritsak
Copy link
Copy Markdown
Contributor Author

They are part of swss

@marian-pritsak marian-pritsak changed the title [syncd][PFC WD]: Add support for SAI PFC storm notification [syncd][PFC WD]: Add support for plugin subscription Oct 5, 2017
@marian-pritsak
Copy link
Copy Markdown
Contributor Author

HW part moved to
#236

Signed-off-by: marian-pritsak <marianp@mellanox.com>
{
collectCounters(countersTable);
runPlugins(db);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runPlugins and main syncd thread both access m_portCounterIdsMap and m_queueCounterIdsMap. Add lock?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you confirm? they are protected by g_mutex lock.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runPlugins doesn't acquire the g_mutex lock.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


for (const auto& sha : m_portPlugins)
{
runRedisScript(db, sha, portList, argv);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do not use them, but does not hurt to have them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently no one uses port plugins


for (const auto& sha : m_queuePlugins)
{
runRedisScript(db, sha, queueList, argv);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the result of redis script being used?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syncd doesn't care about a result of a script. It only executes them to make changes in DB or to notify other modules.

static void removeQueue(
_In_ sai_object_id_t queueVid);

static void addPortCounterPlugin(
Copy link
Copy Markdown
Contributor

@sihuihan88 sihuihan88 Oct 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using addPortCounterPlugin?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, we are not, but It seems good to have it for future use.

Copy link
Copy Markdown
Contributor Author

@marian-pritsak marian-pritsak Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use it now


static void addPortCounterPlugin(
_In_ std::string sha);
static void addQueueCounterPlugin(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the function is a little misleading. The plugin is not only for queue counters, it might also query port counters etc. Consider change the name to addPlugin, or addQueuePlugin, addPortPlugin if we need both

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type of plugin means what objects will be passed as arguments for execution. Of course, a client can register port counters for polling as well, but the plugin will only receive queue counters to do its logic.


PfcWatchdog &wd = getInstance();

if (wd.m_portPlugins.find(sha) != wd.m_portPlugins.end() ||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why also need to check the m_portPlugins here? If that's the case, why separate them in the first place?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client is not allowed to register the exactly same plugin for different types. Later plugin will have no idea which objects were passed to it as arguments.

syncd/syncd.cpp Outdated
auto idStrings = swss::tokenize(value, ',');

if (field == PFC_WD_PORT_COUNTER_ID_LIST)
if (objectType == SAI_OBJECT_TYPE_PORT && field == PFC_WD_PORT_COUNTER_ID_LIST)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to determine the objectType here? Only field should be enough?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure that client doesn't accidentally subscribe to the wrong object.

syncd/syncd.cpp Outdated
PfcWatchdog::setPortCounterList(vid, rid, portCounterIds);
}
else if (field == PFC_WD_QUEUE_COUNTER_ID_LIST)
else if (objectType == SAI_OBJECT_TYPE_QUEUE && field == PFC_WD_QUEUE_COUNTER_ID_LIST)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to determine the objectType here? Only field should be enough?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure that client doesn't accidentally subscribe to the wrong object.


if (value == sai_serialize_object_type(SAI_OBJECT_TYPE_PORT))
{
PfcWatchdog::addPortCounterPlugin(key);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to use portcounterplugin?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, at least not for now, because PFC WD works on queue level.

syncd/syncd.cpp Outdated
}
else
{
SWSS_LOG_ERROR("Object tyoe for removal not supported");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tyoe -> type

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you also need continue.

I think it is better to do as follows, and it will be more clear.

if (op == DEL_COMMAND)
{
}
else if (op == SET_COMMAND)
{
}
else
{
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: marian-pritsak <marianp@mellanox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants