Skip to content

[Link Event Damping] Add per port link event damper class.#1334

Open
Ashish1805 wants to merge 1 commit intosonic-net:masterfrom
Ashish1805:ledamping_util
Open

[Link Event Damping] Add per port link event damper class.#1334
Ashish1805 wants to merge 1 commit intosonic-net:masterfrom
Ashish1805:ledamping_util

Conversation

@Ashish1805
Copy link
Contributor

  • Adding per port link event damper class that manages the internal behavior and workings of link event damping logic on a port where link event damping config is enabled.
  • This class keeps track of damping timer, current port state, post damping advertised port state and relevant debug counters per port.

HLD: sonic-net/SONiC#1071

- Adding per port link event damper class that manages the internal
  behavior and workings of link event damping logic on a port where link
  event damping config is enabled.
- This class keeps track of damping timer, current port state, post
  damping advertised port state and relevant debug counters per port.

HLD: sonic-net/SONiC#1071
Comment on lines +141 to +148
if (m_penaltyCeiling - m_accumulatedPenalty > m_flapPenalty)
{
m_accumulatedPenalty += m_flapPenalty;
}
else
{
m_accumulatedPenalty = m_penaltyCeiling;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use std::max ?

SWSS_LOG_ENTER();

m_timer.setInterval(
/*interval=*/{.tv_sec = (time_us / int64_t(MICRO_SECS_PER_SEC)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment in weird place, please remove we know it's setInterval function

Comment on lines +321 to +322
m_maxSuppressTimeUsec =
uint64_t(config.max_suppress_time) * MICRO_SECS_PER_MILLI_SEC;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

join those lines, and why you need to explicitly cast to 64?


namespace syncd
{

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty line not needed

Comment on lines +22 to +31
inline uint64_t getCurrentTimeUsecs()
{
SWSS_LOG_ENTER();

struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);

return (ts.tv_sec * MICRO_SECS_PER_SEC) +
(ts.tv_nsec / NANO_SECS_PER_MICRO_SEC);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move code to cpp file

Comment on lines +33 to +34
struct DampingStats
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each struct and class should have it's own file cpp and h

Comment on lines +35 to +46
// Number of link UP events received.
uint64_t pre_damping_up_events;
// Number of link DOWN events received.
uint64_t pre_damping_down_events;
// Number of link UP events advertised post damping.
uint64_t post_damping_up_events;
// Number of link DOWN events advertised post damping.
uint64_t post_damping_down_events;
// Timestamp for last advertised link up event post damping.
std::string last_advertised_up_event_timestamp;
// Timestamp for last advertised link down event post damping.
std::string last_advertised_down_event_timestamp;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add empty line before each line of comment for readibility or put all comments on the right side

struct DampingStats m_stats;

friend class PortLinkEventDamperPeer;
friend class LinkEventDamperPeer;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in entire project we don't use friend class, please explain in details why this is exactly what you need and it cant be done in other way?

{

// Peer class to access private state and APIs.
class PortLinkEventDamperPeer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be Mock class?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classes in tests should be in separated files

sai_serialize_object_id(data.port_id).c_str(),
sai_serialize_port_oper_status(data.port_state).c_str());

m_notificationHandler->onPortStateChangePostLinkEventDamping(/*count=*/1, &data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not right, you are manually generating port notification, this breaks SAI design, @lguohan we need to discuss this whether this is allowed

Copy link
Contributor Author

@Ashish1805 Ashish1805 Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not manual generation of port event notification exactly rather delayed generation of port events as it is intended with link event damping feature - for the link event damping, the actual port events will be trapped here (https://github.com/sonic-net/sonic-sairedis/blob/master/syncd/NotificationHandler.cpp#L164-L172) (NotificationHandler changes will be added in subsequent PRs to do that) and sent to link event damping logic running in syncd main thread where events will be processed as per the damping algo and config. If port events are not needed to be damped or damping duration have elapsed and damped port event needs to be advertised to OA (and other applications), corresponding port event notifications will be generated as the case here.

Since we are doing review of a portion of a code at a time, full logic is not self explanatory here.

Targeted final onPortStateChange() will look like following where port events are trapped and sends for processing. Here m_portStateChangeHandler is of type PortStateChangeHandler (added in #1310):

void NotificationHandler::onPortStateChange(
In uint32_t count,
In const sai_port_oper_status_notification_t *data)
{
SWSS_LOG_ENTER();

if (m_portStateChangeHandler == nullptr) {
    auto s = sai_serialize_port_oper_status_ntf(count, data);
    enqueueNotification(SAI_SWITCH_NOTIFICATION_NAME_PORT_STATE_CHANGE, s);
} else {
    // Send notification to port state change handler for further processing.
    m_portStateChangeHandler->HandlePortStateChangeNotification(count, data);
}

}

cc: @mikeberesford.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like manual generate, since this code is creating notification data and manually calling onPortStateChangePostLinkEventDamping, which all notifications should come from vendor SAI and not be generated on demand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the intention of the feature. SAI generated link event is trapped by link event damping feature and it will be sent to application as per the damping logic. For example:

  1. if damping logic says SAI generated event notification needs to damped, that notification will be suppressed in the syncd and will not be sent to application.
  2. if damping logic says no damping needed, then the notification needs to be sent to application using onPortStateChangePostLinkEventDamping(). Since original SAI notification is trapped and processed in the damper logic, we need to manually create the corresponding notification data and send it to application.

Since the original notification from SAI is trapped and processed by damper, we will need to create notification data when notification needs to be sent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fully understand this logic, can you make some kind of flow diagram ? So e I'm not convinced still why we generate manually this notification

@DendroLabs
Copy link

Hi @Ashish1805 and @kcudnik,

This PR and the related link event damping PRs (#1323, #1331) have been inactive for nearly 2 years. I'd like to pick up this work and drive it to completion.

@kcudnik -- you requested a flow diagram back in February 2024 to clarify the notification forwarding design. Here it is:

Damper Flow

The damper does not fabricate new port state. It acts as a store-and-forward interceptor for real SAI notifications:

1. NORMAL FLOW (penalty below suppress threshold):

  Vendor SAI -- port_state_change --> Syncd Damper -- forward as-is --> Orchagent
                {port_id, DOWN}       penalty += P
                                      penalty < S
                                      -> FORWARD

2. SUPPRESSION (penalty >= suppress threshold):

  Vendor SAI -- port_state_change --> Syncd Damper          (nothing sent
                {port_id, UP}         penalty still >= R     to orchagent)
                                      -> SUPPRESS
                                      store last_event=UP

3. UNSUPPRESSION (penalty decays below reuse threshold):

  Syncd Damper -- forward stored {port_id, last_event} --> Orchagent
  timer fires
  penalty < R
  -> UNSUPPRESS

The notification forwarded in step 3 contains the exact data from the last real SAI notification -- it is not invented state. The damper stores the {port_id, oper_status} from step 2 and releases it when the penalty decays. This is functionally a time-delayed queue.

This is the standard pattern used by Cisco IOS-XR (interface manager dampening module) and Juniper JUNOS (ifp process) -- both intercept hardware link state transitions and re-inject them into the notification path after the dampening timer expires.

Safety guarantees:

  • Before forwarding a stored event, the damper verifies it matches the most recent SAI-reported state
  • Port removal cleans up all damper state and cancels timers
  • Disabling dampening immediately releases any suppressed state
  • The damper never forwards more notifications than it receives

I also plan to address the other review comments on this PR and the dependencies (#1323, #1331):

Additionally, based on a review of industry complaints (Cisco, Juniper, Arista forums, RIPE, NANOG), I'm proposing two enhancements that would make SONiC's implementation best-in-class:

1. Syslog on suppress/unsuppress -- The #1 complaint across all vendors is that dampening operates silently. No vendor logs when suppression activates. We should emit:

  • WARNING: LED_SUPPRESS: Ethernet0 dampened (penalty=3500 > suppress=2000, half_life=5s)
  • NOTICE: LED_UNSUPPRESS: Ethernet0 undampened (penalty=950 < reuse=1000)

2. Monitor-only mode (algorithm="aied-monitor") -- RFC 7196 explicitly recommends a "Calculate But Do Not Damp" mode so operators can tune parameters safely in production. No vendor has implemented this. In monitor mode, the damper calculates penalties and updates STATE_DB / emits syslog, but does NOT suppress events. This is a one-line check in the damper's forward/suppress decision path.

Both are low-complexity additions to the per-port damper class.

I'll open new superseding PRs once rebased onto current master, crediting @Ashish1805 as original author. Happy to discuss the design further if there are remaining concerns.

@mikeberesford
Copy link

Thanks for picking this up @DendroLabs, would you mind tagging me on the new PRs when they're opened?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants