Skip to content

Commit 5b037ec

Browse files
authored
Merge pull request sonic-net#232 from BRCM-SONIC/link_flap_err_disable_updates
Link flap err disable updates
2 parents a16dea7 + e718757 commit 5b037ec

2 files changed

Lines changed: 114 additions & 24 deletions

File tree

system/Interface_Down_Reason.md

Lines changed: 76 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@
5959
|:---:|:-----------:|:------------------:|-----------------------------------|
6060
| 0.1 | 04/05/2021 | Prasanth K V | Initial version |
6161
| 0.2 | 05/17/2021 | Madhukar K | Modified portchannel content |
62+
| 0.3 | 06/22/2021 | Prasanth K V | Added REST details and DB schema |
6263

6364
# About this Manual
6465
This document provides comprehensive functional and design information about the *Interface Down Reason* feature implementation in SONiC.
@@ -68,7 +69,7 @@ This document provides comprehensive functional and design information about the
6869
### Table 1: Abbreviations
6970
| **Term** | **Meaning** |
7071
|--------------------------|-------------------------------------|
71-
| PCS | Physical Coding Sub-layer |
72+
| PMD | Physical Medium Dependent |
7273
| LACP | Link Aggregation Control Protocol |
7374

7475
# 1 Feature Overview
@@ -101,8 +102,8 @@ So an interface flap affects the system in general and hence it is important to
101102
- Transceiver not present
102103
- Port breakout in-progress
103104
- High BER
104-
- PCS AM lock error
105-
- PCS sync error
105+
- PMD-CDR-lock
106+
- PMD-signal-detected
106107
- STP error disabled
107108
- Transceiver error disabled
108109
- UDLD error disabled
@@ -171,6 +172,26 @@ SAI specification has to be updated to get the events from SAI to upper layer.
171172

172173
### 3.2.1 CONFIG DB
173174
### 3.2.2 APP DB
175+
A new field, reason, is been added to PORT_TABLE:
176+
```
177+
"PORT_TABLE": {
178+
"Ethernet40": {
179+
...
180+
"reason": "OPER_UP",
181+
...
182+
}
183+
}
184+
```
185+
A new table is added for keeping track of the events IF_REASON_EVENT:
186+
```
187+
"IF_REASON_EVENT": {
188+
"Ethernet40": {
189+
"reason": "OPER_UP",
190+
"event": "PHY_link_up",
191+
"timestamp": "2021-06-06 09:29:55.639018"
192+
}
193+
}
194+
```
174195
### 3.2.3 STATE DB
175196
### 3.2.4 ASIC DB
176197
### 3.2.5 COUNTER DB
@@ -214,7 +235,7 @@ Name Description Oper Reason Speed MTU Alternate Name
214235
----------------------------------------------------------------------------------
215236
Eth1/1 - Down Admin-down 100000 9100 Ethernet0
216237
Eth1/2/1 - Down Err-disabled 10000 9100 Ethernet4
217-
Eth1/2/2 - Down Phy-link-down 10000 9100 Ethernet5
238+
Eth1/2/2 - Down PHY-link-down 10000 9100 Ethernet5
218239
Eth1/2/3 - Up Link-up 10000 9100 Ethernet6
219240
```
220241
- *show interface status <reason>*
@@ -225,13 +246,13 @@ sonic# show interface status err-disabled
225246
----------------------------------------------------------------------------------
226247
Name Event Timestamp
227248
----------------------------------------------------------------------------------
228-
Eth1/2/1 STP-down 2021-04-16 10:23:29
249+
Eth1/2/1 STP-err-disabled 2021-04-16 10:23:29
229250
```
230251
- *show interface <interface>*
231252
show interface command to display the down reasons as shown in the below example:
232253
```
233254
sonic# show interface Eth 1/2/2
234-
Eth1/2/2 is up, line protocol is down, reason phy-link-down
255+
Eth1/2/2 is up, line protocol is down, reason PHY-link-down
235256
Remote-fault at 2021-01-06 07:49:45.737024
236257
Local-fault at 2021-01-06 07:49:45.737024
237258
Hardware is Eth
@@ -253,6 +274,29 @@ Output statistics:
253274
6 Multicasts, 0 Broadcasts, 0 Unicast
254275
```
255276

277+
The list of events:
278+
Admin-down
279+
Remote-fault
280+
Local-fault
281+
Link-training-failed
282+
Link-training-not-completed
283+
Link-training-not-started
284+
Link-tuning-failed
285+
Link-tuning-not-started
286+
Link-tuning-not-completed
287+
Incompatible-transceiver
288+
Transceiver-not-present
289+
Port-breakout-in-progress
290+
High-BER
291+
PMD-CDR-lock
292+
PMD-signal-detected
293+
STP-err-disabled
294+
Transceiver-err-disabled
295+
UDLD-err-disabled
296+
Link-flap-err-disabled
297+
PHY-link-up
298+
299+
256300
#### Port channel interface
257301
- *show interface status*
258302
Along with the physical interfaces, configured portchannel interfaces are displayed in this command output. The new column, "Reason" displays the high level reason for portchannel down. The reasons are
@@ -313,7 +357,32 @@ Output statistics:
313357
#### 3.6.2.3 Exec Commands
314358

315359
### 3.6.3 REST API Support
316-
*URL-based view*
360+
361+
GET /restconf/data/openconfig-interfaces:interfaces/interface={name}/openconfig-if-ethernet:ethernet/state/openconfig-interfaces-ext:status/down-reason
362+
363+
Example response data:
364+
{
365+
"openconfig-interfaces-ext:down-reason": "OPER_UP"
366+
}
367+
368+
369+
GET /restconf/data/openconfig-interfaces:interfaces/interface={name}/openconfig-if-ethernet:ethernet/state/openconfig-interfaces-ext:reason-events
370+
371+
Example response data:
372+
{
373+
"openconfig-interfaces-ext:reason-events": {
374+
"down-reason-event": [
375+
{
376+
"reason-event": {
377+
"reason": "OPER_UP",
378+
"event": "PHY-link-up",
379+
"timestamp": "2021-06-06 09:29:55.639018"
380+
}
381+
}
382+
]
383+
}
384+
}
385+
317386

318387
### 3.6.4 gNMI Support
319388
*Generally this is covered by the YANG specification. This section should also cover objects where on-change and interval based telemetry subscriptions can be configured.*

system/intf-dampening-HLD.md

Lines changed: 38 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -38,37 +38,37 @@ The Port Link Flap Error Disable feature uses an exponential decay mechanism to
3838

3939
When Port Link Flap Error Disable is enabled, the system monitors the number of times a port link state toggles from "up to down", and not from "down to up".
4040

41-
The sampling time or window (the time during which the specified toggle threshold can occur before the wait period is activated) is triggered when the first "up to down" transition occurs.
41+
The sampling interval or window (the time during which the specified toggle threshold can occur before the recovery wait period is activated) is triggered when the first "up to down" transition occurs.
4242

43-
If the port link state toggles from up to down for a specified number of times within a specified period, the interface is physically disabled for the specified wait period. Once the wait period expires, the port link state is re-enabled. However, if the wait period is set to zero (0) seconds, the port link state will remain disabled until it is manually disabled and re-enabled or Port Link Flap Error Disable is disabled on this port.
43+
If the port link state toggles from up to down for a specified number of times within a specified period, the interface is physically disabled for the specified recovery wait period. Once the recovery wait period expires, the port link state is re-enabled. However, if the recovery wait period is set to zero (0) seconds, the port link state will remain disabled until it is manually disabled and re-enabled or Port Link Flap Error Disable is disabled on this port.
4444

4545

4646
## 1.1 Requirements
4747
System shall be able to suppress interfaces state change events to protect system resources.
4848
User shall be able to enable or disable the feature on individual interfaces and globally.
4949
The feature must be disabled on all interfaces by default.
5050
The feature shall be supported on physical interfaces.
51-
There must be two sets of configuration parameters (sample-interval, recovery-interval, and flap-threshold) a per-interface set and a global set. If both global and per-interface are configured, the per-interface values are used only for given interfaces. Global values are used for all other physical interfaces.
51+
There must be two sets of configuration parameters (sampling-interval, recovery-interval, and flap-threshold) a per-interface set and a global set. If both global and per-interface are configured, the per-interface values are used only for given interfaces. Global values are used for all other physical interfaces.
5252
If no values are specified by user, a default set of parameters are applied to all interfaces.
5353
User shall be able to save configuration parameters (both global and per-interface).
5454
The configuration parameters (both global and per-interface) must be preserved across device reboot.
5555

5656
### 1.1.1 Functional Requirements
5757
Port Link Flap Error Disable shall use below parameters to supress and protect system.
5858
- flap-threshold
59-
Specifies the number of times a port link state goes from up to down before the wait period is activated. The value ranges from 1 through 50.
60-
- sample-interval
61-
Specifies the amount of time, in seconds, during which the specified toggle threshold can occur before the wait period is activated. The value ranges from 1 through 65535.
59+
Specifies the number of times a port link state goes from up to down before the recovery wait period is activated. The value ranges from 1 through 50.
60+
- sampling-interval
61+
Specifies the amount of time, in seconds, during which the specified toggle threshold can occur before the recovery wait period is activated. The value ranges from 1 through 65535.
6262
- recovery-interval
6363
Specifies the amount of time in seconds, for which the port remains disabled (down) before it becomes enabled. The value ranges from 0 through 65534. A value of 0 indicates that the port will stay down until an administrative override occurs.
6464

6565
### 1.1.2 Configuration and Management Requirements
6666
- Port Link Flap Error Disable feature default is OFF on all physical interfaces and port-channels
6767
- When Port Link Flap Error Disable is enabled, use below default values:
6868
flap-threshold: 3
69-
sample-interval: 10
69+
sampling-interval: 10
7070
recovery-interval: 300
71-
- User shall be able to specify different sample-interval, flap-threshold and recovery-interval on a physical interface
71+
- User shall be able to specify different sampling-interval, flap-threshold and recovery-interval on a physical interface
7272
- User shall be able to display current Port Link Flap Error Disable confiuration values.
7373
- User shall be able to display current interface status if it was surpresed by Port Link Flap Error Disable
7474
- User shall be able to display Link-Down-Reason if a port is disabled by Port Link Flap Error Disable feature
@@ -101,23 +101,37 @@ The Interface Error Disable feature exist in below modules and containers:
101101
- *link-error-disable flap-threshold <flap count> sampling-interval <interval in sec> recovery-interval <recovery interval in sec>*
102102
Example:
103103
```
104-
sonic(conf-if-Ethernet0)# link-error-disable flap-threshold 10 sampling-time 3 recovery-timeout 10
104+
sonic(conf-if-Ethernet0)# link-error-disable flap-threshold 10 sampling-interval 3 recovery-interval 10
105105
```
106106
In this example, the values for the parameters are as follows:
107107

108-
The flap-threshold is set at 10 times. This interval is the number of times that the port's link state goes from up to down and down to up before the recovery-timeout is activated. Enter a valid value range from 1-50. Default is 3.
108+
The flap-threshold is set at 10 times. This interval is the number of times that the port's link state goes from up to down and down to up before the recovery-interval is activated. Enter a valid value range from 1-50. Default is 3.
109109

110110

111-
The sampling-time is set to 3 seconds. This time period is the amount of time during which the specified flap-threshold can be crossed. If the flap-threshold is crossed during this sampling-time, port will be error-disabled. Enter a value between 1 and 65535 seconds. Default is 10.
111+
The sampling-interval is set to 3 seconds. This time period is the amount of time during which the specified flap-threshold can be crossed. If the flap-threshold is crossed during this sampling-interval, port will be error-disabled. Enter a value between 1 and 65535 seconds. Default is 10.
112112

113113

114-
The recovery-timeout is set to 10 seconds. This period of time is the amount of time the port remains disabled (down) before it becomes enabled. Entering 0 indicates that the port will stay down until an administrative override occurs. Enter a value between 0 and 65534 seconds. Default is 300.
114+
The recovery-interval is set to 10 seconds. This period of time is the amount of time the port remains disabled (down) before it becomes enabled. Entering 0 indicates that the port will stay down until an administrative override occurs. Enter a value between 0 and 65534 seconds. Default is 300.
115115

116116

117117
This config command can be executed on a range of interfaces as well. Example:
118118
```
119-
sonic(conf-if-range-eth**)# link-error-disable flap-threshold 10 sampling-time 3 recovery-timeout 10
119+
sonic(conf-if-range-eth**)# link-error-disable flap-threshold 10 sampling-interval 3 recovery-interval 10
120120
```
121+
122+
The following command is used to enable the link-error-disable with default values for flap-threhsold, sampling-interval and recovery-interval:
123+
124+
```
125+
sonic(conf-if-Ethernet0)#link-error-disable
126+
```
127+
128+
This command to enable link-error-disable with default parameters is supported for the range of interfaces as well, as shown in the below example:
129+
130+
```
131+
sonic(conf-if-range-eth**)# link-error-disable
132+
```
133+
134+
121135
Example for disabling link-flap error-disable on a port:
122136
```
123137
sonic(conf-if-Ethernet0)#no link-error-disable
@@ -165,12 +179,19 @@ The ports which does not have non-default error disable configurations will not
165179
Example:
166180
```
167181
sonic#show errdisable link-flap
168-
Interface Flap-threshold Sampling-time Recovery-timeout Status
182+
Interface Flap-threshold Sampling-interval Recovery-interval Status
169183
---------------------------------------------------------------------------
170-
Ethernet0 10 3 30 Errdisabled
171-
Ethernet4 10 3 60 Not-errdisabled
172-
Ethernet8 5 10 300 Off
184+
Ethernet0 10 3 30 Errdisabled
185+
Ethernet4 10 3 60 Not-errdisabled
186+
Ethernet8 5 10 300 Off
173187
```
188+
189+
The possible status values are
190+
1. Errdisabled: The number of link flaps in a sampling interval crossed the threshold and port is currently in err-disabled state.
191+
2. Not-errdisabled: The err-disable is enabled, but number of flaps in sampling intervals did not cross the configured threshold.
192+
3. Off: The err-disable parameters are configured but it is not enabled.
193+
4. On: The err-disable is enabled, and no link flaps since then.
194+
174195
# 2.2 Functional Description
175196

176197
# 3 Design

0 commit comments

Comments
 (0)