-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Description
This is an issue observed in the latest SONiC images. When the cables are connected/removed in the switch ports, interface status is not correctly reflected in 'show interfaces status'. Link status is updated correctly in the broadcom side. Changes in the link state are not updated in the kernel. The files '/sys/class/net/Ethernet*/carrier' are not updated whenever the OIR is done. After a system reboot, the link status is updated in the kernel and SONiC is able to report the status correctly.
We tried to narrow down this problem to a specific kernel version, but the SONiC builds are broken when we go back to couple of weeks / months due to certain package versions missing.
The last commit on which we didn't observe this issue was on Feb-26 and there are quite significant changes happened in 'drivers/net' directory of the kernel in the span of 1.5 months.
Please have a look on this issue and suggest how to debug further.
Testing environment
Switch: QFX5200-32C-S
ASIC: TH1
Branch: master
Link is configured with 100G and the DAC cable is connected back to back with the ports in the same switch. No platform specific drivers are loaded apart from ASIC configuration files in 'device' directory.
Here are the logs from problematic image (May 20th Jenkins image)
Trial 1
100G DAC Cable connected to port 0 and port 1 after the box is rebooted.
root@sonic:/home/admin# show interfaces status
Interface Lanes Speed MTU Alias Vlan Oper Admin Type Asym PFC
----------- --------------- ------- ----- --------------- ------ ------ ------- ------ ----------
Ethernet0 49,50,51,52 100G 9100 hundredGigE1/1 routed up up N/A N/A
Ethernet4 53,54,55,56 100G 9100 hundredGigE1/2 routed up up N/A N/A
Ethernet8 57,58,59,60 100G 9100 hundredGigE1/3 routed down up N/A N/A
....
BCMCMD also shows link up in bcm asic for these 2 ports
root@sonic:/home/admin# bcmcmd ps | grep ce12
ce12( 50) up 4 100G FD SW No Forward None F KR4 9122 No
root@sonic:/home/admin# bcmcmd ps | grep ce13
ce13( 54) up 4 100G FD SW No Forward None F KR4 9122 No
Also "carrier" parameter is updated in "sys" directory for the ports connected with cables
root@sonic:/home/admin# cat /sys/class/net/Ethernet4/carrier
1
root@sonic:/home/admin# cat /sys/class/net/Ethernet0/carrier
1
Trial 2
100G DAC Cable connected to port 0 and connected to port 31.
root@sonic:/home/admin# show interfaces status
Interface Lanes Speed MTU Alias Vlan Oper Admin Type Asym PFC
----------- --------------- ------- ----- --------------- ------ ------ ------- ------ ----------
Ethernet0 49,50,51,52 100G 9100 hundredGigE1/1 routed up up N/A N/A
Ethernet4 53,54,55,56 100G 9100 hundredGigE1/2 routed up up N/A N/A
...
Ethernet120 9,10,11,12 100G 9100 hundredGigE1/31 routed down up N/A N/A
Ethernet124 13,14,15,16 100G 9100 hundredGigE1/32 routed down up N/A N/A
BCMCMD also shows link up in bcm asic for these 2 ports
root@sonic:/home/admin# bcmcmd ps | grep ce12
ce12( 50) up 4 100G FD SW No Forward None F KR4 9122 No
root@sonic:/home/admin# bcmcmd ps | grep ce3
ce3( 13) up 4 100G FD SW No Forward None F KR4 9122 No
"carrier" parameter is not updated in "sys" directory for the ports connected with cables, still showing carrier up for port 1
root@sonic:/home/admin# cat /sys/class/net/Ethernet124/carrier
0
root@sonic:/home/admin# cat /sys/class/net/Ethernet4/carrier
1
Here is the dump from problematic image:
sonic_dump_sonic_20200526_071154.tar.gz
Last working commit from master branch: 1ef7403
Here are the logs from the working kernel image:
Trial 1
DAC cable is connected between physical port-0 & port-1 and system is rebooted.
root@sonic:/home/admin# show interfaces status
Interface Lanes Speed MTU Alias Vlan Oper Admin Type Asym PFC
----------- --------------- ------- ----- --------------- ------ ------ ------- ------ ----------
Ethernet0 49,50,51,52 100G 9100 hundredGigE1/1 routed up up N/A N/A
Ethernet4 53,54,55,56 100G 9100 hundredGigE1/2 routed up up N/A N/A
root@sonic:/home/admin# bcmcmd ps | grep ce12
ce12( 50) up 4 100G FD SW No Forward None F KR4 9122 No
root@sonic:/home/admin# bcmcmd ps | grep ce13
ce13( 54) up 4 100G FD SW No Forward None F KR4 9122 No
root@sonic:/home/admin# cat /sys/class/net/Ethernet4/carrier
1
root@sonic:/home/admin# cat /sys/class/net/Ethernet0/carrier
1
Trial 2
100G DAC Cable connected to port 0 and connected to port 31.
root@sonic:/home/admin# show interfaces status
Interface Lanes Speed MTU Alias Vlan Oper Admin Type Asym PFC
----------- --------------- ------- ----- --------------- ------ ------ ------- ------ ----------
Ethernet0 49,50,51,52 100G 9100 hundredGigE1/1 routed up up N/A N/A
Ethernet4 53,54,55,56 100G 9100 hundredGigE1/2 routed down up N/A N/A
.....
Ethernet120 9,10,11,12 100G 9100 hundredGigE1/31 routed down up N/A N/A
Ethernet124 13,14,15,16 100G 9100 hundredGigE1/32 routed up up N/A N/A
root@sonic:/home/admin# bcmcmd ps | grep ce13
ce13( 54) down 4 100G FD SW No Forward None F KR4 9122 No
root@sonic:/home/admin# bcmcmd ps | grep ce12
ce12( 50) up 4 100G FD SW No Forward None F KR4 9122 No
root@sonic:/home/admin# bcmcmd ps | grep ce3
ce3( 13) up 4 100G FD SW No Forward None F KR4 9122 No
root@sonic:/home/admin# cat /sys/class/net/Ethernet124/carrier
1
root@sonic:/home/admin# cat /sys/class/net/Ethernet0/carrier
1