Skip to content

vpp: support L2 VXLAN EVPN #1761

Open
sambenko wants to merge 13 commits intosonic-net:masterfrom
sambenko:feat/L2-vxlan-evpn
Open

vpp: support L2 VXLAN EVPN #1761
sambenko wants to merge 13 commits intosonic-net:masterfrom
sambenko:feat/L2-vxlan-evpn

Conversation

@sambenko
Copy link

@sambenko sambenko commented Feb 5, 2026

This PR aims to add support for L2 VXLAN with BGP EVPN to the vpp platform. Previously, only L3 VXLAN using SAI_OBJECT_TYPE_NEXT_HOP was supported. This change enables L2 bridging over VXLAN using BGP EVPN for MAC learning and VTEP discovery.

For now, TunnelVPPData is shared and only tunnel creation works.

Signed-off-by: Samuel Benko <samuel.benko@pantheon.tech>
@mssonicbld
Copy link
Collaborator

/azp run

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Feb 5, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: sambenko / name: Samuel Benko (15f3dd2)

@azure-pipelines
Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

Signed-off-by: Samuel Benko <samuel.benko@pantheon.tech>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

Copy link
Contributor

@yue-fred-gao yue-fred-gao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. This is an important feature. Can you clarify what tests you have run?

if (vni != 0 && vlan_id != 0) break;
}

if (vni == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the tunnel is a l3 tunnel and there is no tunnel mapper of SAI_TUNNEL_MAP_TYPE_VNI_TO_VLAN_ID? Will this cause l3 tunnel creation failed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I thought that l3 tunnel creation uses a different code path (SAI_OBJECT_TYPE_NEXT_HOP -> createNexthop -> create_tunnel_encap_nexthop -> tunnel_encap_nexthop_action), so I assumed that it won't even be hit.
I could change it to return SAI_STATUS_SUCCESS instead of failure

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 4b83fcf based on Copilot suggestion - now returns SAI_STATUS_SUCCESS to skip non-L2 tunnels gracefully.

@yue-fred-gao
Copy link
Contributor

sonic-vpp t1-lag sonic-mgmt test is in PR checker. You can trigger it like this PR: sonic-net/sonic-buildimage#25369

@sambenko
Copy link
Author

sambenko commented Feb 6, 2026

Thanks for the contribution. This is an important feature. Can you clarify what tests you have run?

Hey, I have ran tests on a local containerlab topology:

  • 2x sonic-vpp routers and 2x linux PCs (praqma/network-multitool:latest image)
  • bgp evpn peering between routers for vtep discovery and mac learning
  • PCs attached to access VLANs (untagged) on each router

With this setup I tested and confirmed:

  • vpp vxlan tunnel creation with 'vppctl show vxlan tunnel'
  • bridge domain membership with tunnel interface 'vppctl show bridge-domain'
  • verified sai objects in redis asic_db (tunnel, tunnel maps, tunnel map entries)
  • single vni vlan mapping

@sambenko
Copy link
Author

sambenko commented Feb 6, 2026

/azp run Azure.sonic-sairedis

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1761 in repo sonic-net/sonic-sairedis

@yue-fred-gao
Copy link
Contributor

sonic-vpp t1-lag sonic-mgmt test is in PR checker. You can trigger it like this PR: sonic-net/sonic-buildimage#25369

Can you run sonic-vpp sonic-mgmt to make sure it doesn't break anything?

@yue-fred-gao
Copy link
Contributor

sonic-vpp t1-lag sonic-mgmt test is in PR checker. You can trigger it like this PR: sonic-net/sonic-buildimage#25369

Can you run sonic-vpp sonic-mgmt to make sure it doesn't break anything?

nvm. You already did :)

@lguohan
Copy link
Contributor

lguohan commented Feb 12, 2026

any test cases?

@yue-fred-gao
Copy link
Contributor

any test cases?

It seems there is no EVPN test case in sonic-mgmt. sonic-vpp doesn't have a UT infrastructure either. Max from Arista and his team is working on that.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for L2 VXLAN with BGP EVPN to the VPP platform in the sonic-sairedis virtual switch implementation. Previously, only L3 VXLAN using SAI_OBJECT_TYPE_NEXT_HOP was supported. The changes enable L2 bridging over VXLAN tunnels using BGP EVPN for MAC learning and VTEP discovery.

Changes:

  • Extended TunnelVPPData structure to support both L2 and L3 VXLAN tunnel configurations
  • Added create_l2_vxlan_tunnel method to create P2P VXLAN tunnels and associate them with bridge domains (VLANs)
  • Modified VLAN member creation to skip VPP operations for tunnel-type bridge ports

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
vslib/vpp/TunnelManager.h Extended TunnelVPPData with L2 VXLAN fields (VNI, VLAN ID, src/dst IPs) and added m_l2_tunnel_map storage with create_l2_vxlan_tunnel method declaration
vslib/vpp/TunnelManager.cpp Implemented create_l2_vxlan_tunnel to extract tunnel configuration from SAI objects, create VPP VXLAN tunnels, and add them to bridge domains
vslib/vpp/SwitchVppFdb.cpp Added early return in vpp_create_vlan_member to skip VPP operations for SAI_BRIDGE_PORT_TYPE_TUNNEL bridge ports
vslib/vpp/SwitchVpp.cpp Hooked SAI_OBJECT_TYPE_TUNNEL creation to call create_l2_vxlan_tunnel after internal object creation

Signed-off-by: Samuel Benko <samuel.benko@pantheon.tech>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Samuel Benko <samuel.benko@pantheon.tech>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Samuel Benko <samuel.benko@pantheon.tech>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

tahmed-dev added a commit to tahmed-dev/sonic-sairedis that referenced this pull request Feb 19, 2026
Cherry-picked from sonic-net/sonic-sairedis PR sonic-net#1761 (sambenko).
Fixes syncd segfault when EVPN creates P2P tunnel bridge ports:
- Handle SAI_BRIDGE_PORT_TYPE_TUNNEL in vpp_create_vlan_member
- Create/remove VPP VxLAN tunnels for L2 EVPN P2P tunnels
- Add tunnel interface to VLAN bridge domain in VPP

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
tahmed-dev added a commit to tahmed-dev/sonic-sairedis that referenced this pull request Feb 20, 2026
vpp_remove_vlan_member() dereferences SAI_BRIDGE_PORT_ATTR_PORT_ID
without checking the bridge port type. When the bridge port is a
VxLAN tunnel (SAI_BRIDGE_PORT_TYPE_TUNNEL), this attribute is not
set, causing a null pointer dereference and SIGSEGV.

This is the same root cause as the create path fixed by PR sonic-net#1761,
but in the remove path which was missed.

Add a bridge port type check before the PORT_ID lookup, and skip
tunnel bridge ports gracefully.

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
vpp_remove_vlan_member() dereferences SAI_BRIDGE_PORT_ATTR_PORT_ID
without checking the bridge port type. When the bridge port is a
VxLAN tunnel (SAI_BRIDGE_PORT_TYPE_TUNNEL), this attribute is not
set, causing a null pointer dereference and SIGSEGV.

This is the same root cause as the create path fixed by PR sonic-net#1761,
but in the remove path which was missed.

Add a bridge port type check before the PORT_ID lookup, and skip
tunnel bridge ports gracefully.

Signed-off-by: Tamer Ahmed <tamerahmed@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yue-fred-gao
Copy link
Contributor

hi @tahmed-dev , can you share the issues you found in the PR? I saw you made a few changes.

@yue-fred-gao
Copy link
Contributor

hi @sambenko , can you please check the comments from copilot? Some of them make sense. BTW, the current test_vxlan_ecmp test is broken in the baseline. I will take a look. Once that is fixed, we can trigger the PR checker again for sanity check.

Signed-off-by: Samuel Benko <samuel.benko@pantheon.tech>
Signed-off-by: Samuel Benko <samuel.benko@pantheon.tech>
Signed-off-by: Samuel Benko <samuel.benko@pantheon.tech>
Signed-off-by: Samuel Benko <samuel.benko@pantheon.tech>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

When deleting an L2 VXLAN tunnel map entry (e.g. "config vxlan map del"),
the SAI orchestrator cascades removal of VLAN members, bridge ports, and
triggers FDB flushes. Several VPP-specific functions in SwitchVppFdb.cpp
unconditionally access SAI_BRIDGE_PORT_ATTR_PORT_ID on bridge port objects,
but tunnel bridge ports (SAI_BRIDGE_PORT_TYPE_TUNNEL) carry
SAI_BRIDGE_PORT_ATTR_TUNNEL_ID instead. Accessing the missing PORT_ID
attribute returns a null pointer, and dereferencing it causes a segfault
(crash at offset 0x18 from NULL).

The specific crash occurred in vpp_fdbentry_flush() during a
FLUSH_BY_INTERFACE operation (mode 5) on a tunnel bridge port.

Changes:
- Add is_tunnel_bridge_port() utility method to SwitchVpp that safely
  checks whether a bridge port is of type TUNNEL
- Guard vpp_create_vlan_member() using the new utility (replaces inline check)
- Guard vpp_remove_vlan_member() using the new utility (replaces inline check)
- Guard vpp_fdbentry_add() to skip VPP FDB operations for tunnel bridge ports
- Guard vpp_fdbentry_del() to skip VPP FDB operations for tunnel bridge ports
- Guard vpp_fdbentry_flush() FLUSH_BY_INTERFACE case to fall back to
  flush_all for tunnel bridge ports instead of dereferencing PORT_ID

Signed-off-by: Samuel Benko <samuel.benko@pantheon.tech>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sambenko
Copy link
Author

sambenko commented Feb 25, 2026

hey @yue-fred-gao and @lguohan I incorporated the fixes from Copilot suggestions as well as some additional sigsegv issues during vxlan tunnel removal and it works now. Some additional steps that need to be taken, apart from the test_vxlan_ecmp test being broken?

hi @sambenko , can you please check the comments from copilot? Some of them make sense. BTW, the current test_vxlan_ecmp test is broken in the baseline. I will take a look. Once that is fixed, we can trigger the PR checker again for sanity check.

@yue-fred-gao
Copy link
Contributor

hi @sambenko , I have fixed the failure in test_vxlan_ecmp with PR sonic-net/sonic-mgmt#22644. Please trigger the vpp t1-lag test again for sanity. thanks!

@sambenko
Copy link
Author

sambenko commented Mar 2, 2026

hi @sambenko , I have fixed the failure in test_vxlan_ecmp with PR sonic-net/sonic-mgmt#22644. Please trigger the vpp t1-lag test again for sanity. thanks!

Hey @yue-fred-gao thanks for the fix. However, when I try to rerun the failed jobs (t1-lag test), I get

TF400813: The user 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa' is not authorized to access this resource.

could you please rerun it for me so we don't have to trigger the entire pipeline for this? Thank you!
https://dev.azure.com/mssonic/build/_build/results?buildId=1045498&view=results

@yue-fred-gao
Copy link
Contributor

hi @sambenko , I have fixed the failure in test_vxlan_ecmp with PR sonic-net/sonic-mgmt#22644. Please trigger the vpp t1-lag test again for sanity. thanks!

Hey @yue-fred-gao thanks for the fix. However, when I try to rerun the failed jobs (t1-lag test), I get

TF400813: The user 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa' is not authorized to access this resource.

could you please rerun it for me so we don't have to trigger the entire pipeline for this? Thank you! https://dev.azure.com/mssonic/build/_build/results?buildId=1045498&view=results

I don't have the permission either. I think you will have to trigger the entire azure pipeline.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@yue-fred-gao yue-fred-gao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran test_vxlan_ecmp and it passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants