Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions portsyncd/linksync.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include <sys/socket.h>
#include <linux/if.h>
#include <netlink/route/link.h>
#include <netlink/route/link/bridge.h>
#include "logger.h"
#include "netmsg.h"
#include "dbconnector.h"
Expand Down Expand Up @@ -212,12 +213,18 @@ void LinkSync::onMsg(int nlmsg_type, struct nl_object *obj)
return;
}

/* If netlink for this port has master, we ignore that for now
* This could be the case where the port was removed from VLAN bridge
*/
/* Ignore netlink on interfaces belong to VLAN bridge */
if (master)
{
return;
LinkCache &linkCache = LinkCache::getInstance();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, this would be handled by teamsyncd. Can you check?

Copy link
Copy Markdown
Contributor Author

@liorghub liorghub Apr 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prsunny I checked, teamsyncd is handling messages being sent for the port-channel interface itself, those messages are marked with type="team". The bug I fixed concerns the handling of messages for ports that belongs to port-channel. These messages are not marked with type="team".

Copy link
Copy Markdown
Collaborator

@prsunny prsunny Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, @judyjoseph , can you check this? This seems to be basic change and missed. @liorghub, What is the functional impact?

Copy link
Copy Markdown
Contributor Author

@liorghub liorghub Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functional impact is in LLDP, there we check state DB PORT_TABLE for "netdev_oper_status" up before sending LLDP commands. If "netdev_oper_status" is down, LLDP command is not being sent causing wrong LLDP behavior.

See the following code in lldpmgrd.
https://github.com/Azure/sonic-buildimage/blob/cc30771f6b97234a6dd19d8f97d5dfd44551cf20/dockers/docker-lldp/lldpmgrd#L170

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. lgtm. As Xu suggested, please add VS tests to cover this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, @judyjoseph , can you check this? This seems to be basic change and missed. @liorghub, What is the functional impact?

@prsunny I did a quick check .. noting down the events from syslog. I find that the 'netdev_oper_status' is set much earlier for an interface as long as the interface is connected and up. The teamd member addition happens earlier.

Apr 26 18:33:56.812132 str2---1 NOTICE swss0#orchagent: :- initializePort: Initializing port alias:Ethernet4 pid:1000000000006
Apr 26 18:33:56.817494 str2---1 NOTICE swss0#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet4 admin:0 oper:0 addr:40:7c:7d:bb:26:0b ifindex:22 master:0
Apr 26 18:33:56.817741 str2---1 NOTICE swss0#portsyncd: :- onMsg: Publish Ethernet4(ok:down) to state db
Apr 26 18:33:56.818394 str2---1 NOTICE swss0#orchagent: :- addHostIntfs: Create host interface for port Ethernet4
Apr 26 18:33:56.833381 str2---1 NOTICE swss0#orchagent: :- setHostIntfsOperStatus: Set operation status DOWN to host interface Ethernet4
Apr 26 18:33:56.833450 str2---1 NOTICE swss0#orchagent: :- initPort: Initialized port Ethernet4
Apr 26 18:33:56.897841 str2---1 NOTICE swss0#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet4 admin:1 oper:1 addr:40:7c:7d:bb:26:0b ifindex:22 master:0
Apr 26 18:33:56.898243 str2---1 NOTICE swss0#portsyncd: :- onMsg: Publish Ethernet4(ok:up) to state db
Apr 26 18:33:56.898260 str2---1 NOTICE swss0#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet4 admin:1 oper:1 addr:40:7c:7d:bb:26:0b ifindex:22 master:2
Apr 26 18:33:56.898310 str2---1 NOTICE swss0#portsyncd: message repeated 2 times: [ :- onMsg: nlmsg type:16 key:Ethernet4 admin:1 oper:1 addr:40:7c:7d:bb:26:0b ifindex:22 master:2]
Apr 26 18:33:56.900044 str2---1 NOTICE swss0#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet4 admin:1 oper:1 addr:40:7c:7d:bb:26:0b ifindex:22 master:2
Apr 26 18:33:56.901037 str2---1 INFO kernel: [  140.005295] PortChannel102: Port device Ethernet4 added
Apr 26 18:33:56.901375 str2---1 NOTICE teamd0#teammgrd: :- addLagMember: Add Ethernet4 to port channel PortChannel102
Apr 26 18:33:56.912638 str2---1 NOTICE swss0#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet4 admin:1 oper:1 addr:40:7c:7d:bb:26:0b ifindex:22 master:2

@liorghub could you share a bit more details on when you observe this behavior -- is it seen always with lldp ? for all port channel member interfaces ( or only for interface which were initially oper down, after a while they become oper up as they become part of portchannel ? )

Copy link
Copy Markdown
Contributor Author

@liorghub liorghub Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judyjoseph
Hi judy,
Issue happens when switch is booting.
Ethernet0 is part of port-channel.

As you can see below, portsyncd gets several netlink messages for Ethernet0,
The last message that arrives without "master" (master:0) is at 07:19:15.359655 and it is oper down.
Later we get more messages for Ethernet0 with oper up but we ignore them since they are marked with "master".
Interfaces that have master can be either part of vlan bridge or part of port-channel.
We want to ignore only vlan bridge (confirmed with @zhenggen-xu)

Since the last massage for Ethernet0 we handle is with oper down, state DB holds "netdev_oper_status" = "down", this is causing wrong LLDP behaviour.
Issue is persistent and occurs after each reboot.

See below logs:

 root@r-tigon-20:/home/admin# grep -e "nlmsg type" -e Publish  /var/log/syslog  | egrep "Ethernet0"
Apr 28 07:19:15.287582 r-tigon-20 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet0 admin:0 oper:0 addr:1c:34:da:c9:60:68 ifindex:77 master:0 type:sx_netdev
Apr 28 07:19:15.287898 r-tigon-20 NOTICE swss#portsyncd: :- onMsg: Publish Ethernet0(ok:down) to state db
Apr 28 07:19:15.291418 r-tigon-20 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet0 admin:0 oper:0 addr:1c:34:da:c9:60:00 ifindex:77 master:0 type:sx_netdev
Apr 28 07:19:15.291972 r-tigon-20 NOTICE swss#portsyncd: :- onMsg: Publish Ethernet0(ok:down) to state db
Apr 28 07:19:15.359292 r-tigon-20 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet0 admin:0 oper:0 addr:1c:34:da:c9:60:00 ifindex:77 master:0 type:sx_netdev
Apr 28 07:19:15.359510 r-tigon-20 NOTICE swss#portsyncd: :- onMsg: Publish Ethernet0(ok:down) to state db
Apr 28 07:19:15.359655 r-tigon-20 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet0 admin:1 oper:0 addr:1c:34:da:c9:60:00 ifindex:77 master:0 type:sx_netdev
Apr 28 07:19:15.359866 r-tigon-20 NOTICE swss#portsyncd: :- onMsg: Publish Ethernet0(ok:down) to state db
Apr 28 07:19:15.360309 r-tigon-20 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet0 admin:1 oper:0 addr:1c:34:da:c9:60:00 ifindex:77 master:4 type:sx_netdev
Apr 28 07:19:15.360352 r-tigon-20 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet0 admin:1 oper:0 addr:1c:34:da:c9:60:00 ifindex:77 master:4 type:sx_netdev
Apr 28 07:19:15.365219 r-tigon-20 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet0 admin:1 oper:0 addr:1c:34:da:c9:60:00 ifindex:77 master:4 type:sx_netdev
Apr 28 07:19:15.367925 r-tigon-20 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet0 admin:1 oper:0 addr:1c:34:da:c9:60:00 ifindex:77 master:4 type:sx_netdev
Apr 28 07:19:27.880041 r-tigon-20 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet0 admin:1 oper:1 addr:1c:34:da:c9:60:00 ifindex:77 master:4 type:sx_netdev
Apr 28 07:19:28.011930 r-tigon-20 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet0 admin:1 oper:1 addr:1c:34:da:c9:60:00 ifindex:77 master:4 type:sx_netdev

string masterName = linkCache.ifindexToName(master);
struct rtnl_link *masterLink = linkCache.getLinkByName(masterName.c_str());
bool isBridge = rtnl_link_is_bridge(masterLink);

if(isBridge)
{
return;
}
}

/* In the event of swss restart, it is possible to get netlink messages during bridge
Expand Down