Skip to content

[syncd] Add fdb notification event throttle#1041

Closed
kcudnik wants to merge 1 commit intosonic-net:masterfrom
kcudnik:fdbdrop
Closed

[syncd] Add fdb notification event throttle#1041
kcudnik wants to merge 1 commit intosonic-net:masterfrom
kcudnik:fdbdrop

Conversation

@kcudnik
Copy link
Collaborator

@kcudnik kcudnik commented May 11, 2022

When L2 loop happen, a mac can jump between ports very rapid, and it will cause flood of fdb notification events like (aged, learned, aged learned) between 2 ports. Vendor sai will produce stream of messages that will use 100% cpu, and syncd will be trying to push those notifications to OA. To prevent this, we are adding event throttle on fdb events per MAC basis, and by default that throttle will limit number of messages to 10 events per 1 second per MAC.

@lguohan
Copy link
Contributor

lguohan commented May 23, 2022

can you add description?

@lguohan
Copy link
Contributor

lguohan commented May 25, 2022

instead of "throttling will limit number of messages to 10 events per 1 second per MAC", can we always keep the latest notification for the mac/vlan combination?

@kcudnik
Copy link
Collaborator Author

kcudnik commented May 26, 2022

that was my initial approach, but it have some issues, fist, it would require to unnecessary serialize/deserialize fdb messages into event queues, and most important, if there is a storm of messages, what and when is exactly last message? it's hard to determine, one of approaches i was thinking, that we could delay processing messages, then check the queue for potential duplicates, and remove them, but this would be not efficient for systems with a lot of mac addresses and it will require to scan every item in the queue for each event, so complexity here would be quadratic, and if we would make a queue per mac, then we would lost order of incomming events

with current approach, order of messages is preserved, since messages are filtered before inserting to the queue, on per mac basis, we also don't need to serialize and deserialize messages, downside is that we can't determine when storm will be over

do you have any other suggestions or approach how we could implement here?

@gechiang
Copy link
Contributor

Ideally, we should only keep the last learnt (or moved to) location of the learnt MAC. I believe some SAI vendor implements the move via aged/learnt pair of events while some other vendor that implemented the MOVE event notification will just have a single Move event whenever the MAC moved. I do not see how one can achieve throttle the events and still maintain sync with the SAI layer without having to store the last set of event(s) for a particular MAC.

There are several issues if we simply drop the events when a threshold is reached. it makes the view of the MAC table as perceived by SONiC out of sync with SAI layer. worst yet if the last event that was dropped was an aged event then the MAC will stay on SONiC side forever until this same MAC is relearnt again by SAI layer (which may never happen).
I believe with current code in SONiC we already have this issue if MAC move storm occurred.

Best would be for this filtering be done at the lowest layer (SAI layer). But if this is not achievable, we need to store and filter at some time intervals and then report the MAC events with the latest notified info from SAI...

@kcudnik
Copy link
Collaborator Author

kcudnik commented May 27, 2022

Ideally, we should only keep the last learnt (or moved to) location of the learnt MAC. I believe some SAI vendor implements the move via aged/learnt pair of events while some other vendor that implemented the MOVE event notification will just have a single Move event whenever the MAC moved. I do not see how one can achieve throttle the events and still maintain sync with the SAI layer without having to store the last set of event(s) for a particular MAC.

yes, i'm aware of this problem, the same problem exists on current implementation, when events are dropped, there is a possibility of getting out of syncd on that fdb move, between SAI and sairedis because of dropped events, i don't see how this could be easy fixed, but at list this PR will not forward 100k or more events to sairedis

There are several issues if we simply drop the events when a threshold is reached. it makes the view of the MAC table as perceived by SONiC out of sync with SAI layer. worst yet if the last event that was dropped was an aged event then the MAC will stay on SONiC side forever until this same MAC is relearnt again by SAI layer (which may never happen). I believe with current code in SONiC we already have this issue if MAC move storm occurred.

yes, current code has this issue

Best would be for this filtering be done at the lowest layer (SAI layer). But if this is not achievable, we need to store and filter at some time intervals and then report the MAC events with the latest notified info from SAI...

the solution i was thinking was described in previous post, to store events for example for 1 sec, then try to filter duplicates, and then process, but as mentioned it introduces unnecessary serialization/desrialization, n2 complexity and delay on fdb events

@kcudnik
Copy link
Collaborator Author

kcudnik commented Jun 13, 2022

replaced by #1060

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants