Skip to content

ICCP Bridgeport removal failure when MCLAG interface Bridgeport exist#2535

Open
prasanna-cls wants to merge 3 commits intosonic-net:masterfrom
prasanna-cls:icl_bp_delete_seq_issue
Open

ICCP Bridgeport removal failure when MCLAG interface Bridgeport exist#2535
prasanna-cls wants to merge 3 commits intosonic-net:masterfrom
prasanna-cls:icl_bp_delete_seq_issue

Conversation

@prasanna-cls
Copy link
Contributor

What I did
while processing the bridge port removal, altered the invocation order
from,

  1. Remove bridge port in SAI (sai_bridge_api->remove_bridge_port())
  2. Notify about the bridge port deletion to Isolation group module (orchagent)

to,

  1. Notify about the bridge port deletion to Isolation group module (orchagent)
  2. Remove bridge port in SAI (sai_bridge_api->remove_bridge_port())

Why I did it
When Bridge removal is triggered before clearing the isolation group reference, then SAI returns error stating the bridge port reference count is still non-zero. reversing the invocation order will solve the issue and avoid the SAI errors for Bridgeport deletion.

How I verified it

  1. Deletion of ICCP Bridge port , followed by MCLAG interface Bridgeport - PASS
  2. Deletion of MCLAG interface Bridge port , followed by ICCP Bridgeport - PASS
  3. Recreating the MCLAG interface bridge port followed by ICCP interface bridge port - PASS
  4. Recreating the ICCP interface bridge port followed by MCLAG interface bridge port - PASS

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205

Description for the changelog

Fix for ICCP Bridgeport removal failure when MCLAG interface bridgeport exist

…eted

(removal of last vlan membership), when the MCLAG interface bridgeport instance still exist,
then, SAI returns error since the bridge port is still
referenced by the port isolation group object.
Hence by clearing the ICCP bridge port object reference prior to deleting the
bridge port at SAI level, this error can be avoided.
@prasanna-cls prasanna-cls changed the title In the MCLAG scenario, if the ICCP peer link bridge port is being del… ICCP Bridgeport removal failure when MCLAG interface Bridgeport exist Nov 18, 2022
@prasanna-cls
Copy link
Contributor Author

request to retest please.

gFdbOrch->flushFDBEntries(port.m_bridge_port_id, SAI_NULL_OBJECT_ID);
SWSS_LOG_INFO("Flush FDB entries for port %s", port.m_alias.c_str());

/* Notify about bridgeport deletion */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could cause race-conditions. The idea of notifying other subscribers is to let them know BridgePort deletion is complete. This flow doesn't honor that. Please propose a different solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
From SAI point of view, the ISOLATION_GROUP_TABLE:MCLAG_ISO_GRP object depends upon the bridge port as a dependent object.
For this reason, when SWSS creates a bridge port, MCLAG is notified after the bridge port is created in SAI.
Similarly, during the bridge port deletion scenario, MCLAG has to be notified (to clean-up ISOLATION_GROUP) , before SAI deletes that bridge port.

And the MCLAG(for ISOLATION_GROUP_TABLE) is the only subscriber listening to this bridge port changes. and this changes are done as part of handling MCLAG in this pull request.

do you suggest to have a different message type, something like "SUBJECT_TYPE_BRIDGE_PORT_REMOVAL_HANDLE" that can be called before sai_bridge_api->remove_bridge_port() , instead of "SUBJECT_TYPE_BRIDGE_PORT_CHANGE", which could be retained to be called after SAI brdige port remove ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @prsunny, @prasanna-cls, can we fix this issue as follows

This way there is no change in notification and Isolation group also removed.

diff --git a/orchagent/portsorch.cpp b/orchagent/portsorch.cpp [9/1964]
index 376328f..804f356 100755
--- a/orchagent/portsorch.cpp
+++ b/orchagent/portsorch.cpp
@@ -4,6 +4,7 @@
#include "neighorch.h"
#include "gearboxutils.h"
#include "vxlanorch.h"
+#include "isolationgrouporch.h"
#include "directory.h"
#include "subintf.h"

@@ -49,6 +50,7 @@ extern NeighOrch *gNeighOrch;
extern CrmOrch *gCrmOrch;
extern BufferOrch *gBufferOrch;
extern FdbOrch *gFdbOrch;
+extern IsoGrpOrch gIsoGrpOrch;
extern Directory<Orch
> gDirectory;
extern sai_system_port_api_t *sai_system_port_api;
extern string gMySwitchType;
@@ -4984,6 +4986,9 @@ bool PortsOrch::removeBridgePort(Port &port)
gFdbOrch->flushFDBEntries(port.m_bridge_port_id, SAI_NULL_OBJECT_ID);
SWSS_LOG_INFO("Flush FDB entries for port %s", port.m_alias.c_str());

  • PortUpdate update = { port, false };

  • gIsoGrpOrch->update (SUBJECT_TYPE_BRIDGE_PORT_CHANGE, static_cast<void *>(&update));

  • /* Remove bridge port */
    status = sai_bridge_api->remove_bridge_port(port.m_bridge_port_id);
    if (status != SAI_STATUS_SUCCESS)
    @@ -5000,7 +5005,6 @@ bool PortsOrch::removeBridgePort(Port &port)
    port.m_bridge_port_id = SAI_NULL_OBJECT_ID;

    /* Remove bridge port */

  • PortUpdate update = { port, false };
    notify(SUBJECT_TYPE_BRIDGE_PORT_CHANGE, static_cast<void *>(&update));

    SWSS_LOG_NOTICE("Remove bridge port %s from default 1Q bridge", port.m_alias.c_str());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @kselvaraj2 , this looks ok to me. @prsunny is this approach fine ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prsunny can I update the review with the approach suggested by @kselvaraj2 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants