Skip to content

Added Change for given Route ECMP to fallback on Default Route ECMP #3389

Merged
prsunny merged 27 commits intosonic-net:masterfrom
abdosi:default-route
Mar 25, 2025
Merged

Added Change for given Route ECMP to fallback on Default Route ECMP #3389
prsunny merged 27 commits intosonic-net:masterfrom
abdosi:default-route

Conversation

@abdosi
Copy link
Contributor

@abdosi abdosi commented Nov 24, 2024

What I did:
Added Change for given Route ECMP to fallback on Default Route ECMP. When all the Members of Route are Link Down and if route is eligible for fallback to default route the ECMP Member in SAI Nexthop Goup are updated to the Default Route Nexthop/Nexthop's Members.

This change does not take care of this scenarios:

  1. When the Route which is fallback on Default Route Nexthops if the original nexthop become active [link comes up] it does not move back to original path. Reason is we except this should transient case as the Route which is fallback should get deleted once all the links are down

  2. If Default Routes gets updated [BGP Updates] or if default Route nexthops become link down we do not update ECMP members of Routes that are already fallback to default. Again Reason being Route which is fallback should get deleted once all the links are down and is during this short window getting default routes update is very corner case. We can optimize if needed.

Why I did:
For Faster of Traffic Convergence for Routes where it is ok to send traffic over default route when most specific prefix/route do not have any valid nexthops for transient time before more specific route gets deleted.

How I verified:
UT updated
Ixia based Traffic Convergance.

Reference to full context of this changes
Swss_route_enhancemnts.docx

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
@abdosi abdosi marked this pull request as ready for review November 25, 2024 03:45
@abdosi abdosi requested a review from prsunny as a code owner November 25, 2024 03:45
@abdosi abdosi changed the title Changes for fallback to default route Added Change for given Route ECMP to fallback on Default Route ECMP Nov 25, 2024
@abdosi abdosi requested a review from arlakshm November 25, 2024 03:56

if (default_nhg_key.getSize() == 1)
{
current_default_route_nhops.insert(*default_nhg_key.getNextHops().begin());
Copy link
Contributor

@arlakshm arlakshm Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation #Closed


if (nhopgroup->second.nh_member_install_count == 0 && nhopgroup->second.eligible_for_default_route_nh_swap && !nhopgroup->second.is_default_route_nh_swap)
{
if(nexthop.ip_address.isV4())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if at this time the default route from bgp is not present. will the v4_active_default_route_nhops have the drop port?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arlakshm : if no default route than existing behavior will happen where nexthop group will not have any members which will cause drop as expected.

Comment on lines +1132 to +1133
{
if (ip_prefix.isV4())
Copy link
Contributor

@arlakshm arlakshm Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation #Resolved

ctx.protocol = fvValue(i);
}
if (fvField(i) == "fallback_to_default_route")
{
Copy link
Contributor

@arlakshm arlakshm Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indentation. mix of tabs and spaces #Closed

if (fvField(i) == "fallback_to_default_route")
{
fallback_to_default_route = fvValue(i) == "true";
}
Copy link
Contributor

@arlakshm arlakshm Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indentation. mix of tabs and spaces #Closed

{
removeNextHopGroup(it_nhg.first);
// Pass the flag to indicate if the NextHop Group as Default Route NH Members as swapped.
removeNextHopGroup(it_nhg.first, m_syncdNextHopGroups[it_nhg.first].is_default_route_nh_swap);
Copy link
Contributor

@arlakshm arlakshm Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indentation #Resolved

updateDefaultRouteSwapSet(v4_default_nhg_key, v4_active_default_route_nhops);

if (v6_default_nhg_key.getSize())
updateDefaultRouteSwapSet(v6_default_nhg_key, v6_active_default_route_nhops);
Copy link
Contributor

@arlakshm arlakshm Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indentation #Resolved

RouteBulkContext(const std::string& key, bool is_set)
: key(key), excp_intfs_flag(false), using_temp_nhg(false), is_set(is_set)
: key(key), excp_intfs_flag(false), using_temp_nhg(false), is_set(is_set),
fallback_to_default_route(false)
Copy link
Contributor

@arlakshm arlakshm Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mix of tabs and spaces #Closed

using_temp_nhg = false;
key.clear();
protocol.clear();
fallback_to_default_route = false;
Copy link
Contributor

@arlakshm arlakshm Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove tabs #Closed

Copy link
Contributor

@arlakshm arlakshm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/Azp run Azure.sonic-swss

abdosi and others added 6 commits November 25, 2024 21:08
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
@arlakshm
Copy link
Contributor

/Azp run Azure.sonic-swss

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

arlakshm
arlakshm previously approved these changes Nov 28, 2024
Copy link
Contributor

@arlakshm arlakshm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

abdosi and others added 6 commits November 29, 2024 21:50
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

prsunny pushed a commit that referenced this pull request Feb 24, 2025
*What I did:
Added Change to Skip Route Programming if NH is link/oper down. With Scale Route testing of 60K+ routes when we toggle all the interfaces[14+ interface back to back] as done here: https://github.com/sonic-net/sonic-mgmt/blob/master/tests/snappi_tests/multidut/bgp/test_bgp_outbound_uplink_multi_po_flap.py we see because of slowness of FRR Route APP_DB processing compare to Link Notification Handling where we have updated the Nexthop Group as part of Link Notification handling to point to default route via #3389 [if eligible] FRR slowness can reprogram the Route back to Nexthop which is link down.

This change is similar to #3394 which was done for Nexthop Group.
arlakshm
arlakshm previously approved these changes Feb 26, 2025
Copy link
Collaborator

@prsunny prsunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As offline discussed, please add code comments on critical path.


# Let's give fpmsyncd a chance to connect to Zebra.
time.sleep(5)
time.sleep(10)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this sleep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prsunny : updated


vector<sai_object_id_t> next_hop_ids;
auto& nhgm = next_hop_group_entry->second.nhopgroup_members;
auto& nhgm = is_default_route_nh_swap ? next_hop_group_entry->second.default_route_nhopgroup_members : next_hop_group_entry->second.nhopgroup_members;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment on where the second.nhopgroup_members gets cleaned up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prsunny comments added to major code points.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@prsunny prsunny merged commit 596d88c into sonic-net:master Mar 25, 2025
15 checks passed
Janetxxx pushed a commit to Janetxxx/sonic-swss that referenced this pull request Nov 10, 2025
…-net#3520)

*What I did:
Added Change to Skip Route Programming if NH is link/oper down. With Scale Route testing of 60K+ routes when we toggle all the interfaces[14+ interface back to back] as done here: https://github.com/sonic-net/sonic-mgmt/blob/master/tests/snappi_tests/multidut/bgp/test_bgp_outbound_uplink_multi_po_flap.py we see because of slowness of FRR Route APP_DB processing compare to Link Notification Handling where we have updated the Nexthop Group as part of Link Notification handling to point to default route via sonic-net#3389 [if eligible] FRR slowness can reprogram the Route back to Nexthop which is link down.

This change is similar to sonic-net#3394 which was done for Nexthop Group.
Janetxxx pushed a commit to Janetxxx/sonic-swss that referenced this pull request Nov 10, 2025
…onic-net#3389)

* Added Change for given Route ECMP to  fallback on  Default Route ECMP  (sonic-net#3389)

What I did:
Added Change for given Route ECMP to fallback on Default Route ECMP. When all the Members of Route are Link Down and if route is eligible for fallback to default route the ECMP Member in SAI Nexthop Goup are updated to the Default Route Nexthop/Nexthop's Members.

This change does not take care of this scenarios:

When the Route which is fallback on Default Route Nexthops if the original nexthop become active [link comes up] it does not move back to original path. Reason is we except this should transient case as the Route which is fallback should get deleted once all the links are down

If Default Routes gets updated [BGP Updates] or if default Route nexthops become link down we do not update ECMP members of Routes that are already fallback to default. Again Reason being Route which is fallback should get deleted once all the links are down and is during this short window getting default routes update is very corner case. We can optimize if needed.
baorliu pushed a commit to baorliu/sonic-swss that referenced this pull request Feb 23, 2026
…-net#3520)

*What I did:
Added Change to Skip Route Programming if NH is link/oper down. With Scale Route testing of 60K+ routes when we toggle all the interfaces[14+ interface back to back] as done here: https://github.com/sonic-net/sonic-mgmt/blob/master/tests/snappi_tests/multidut/bgp/test_bgp_outbound_uplink_multi_po_flap.py we see because of slowness of FRR Route APP_DB processing compare to Link Notification Handling where we have updated the Nexthop Group as part of Link Notification handling to point to default route via sonic-net#3389 [if eligible] FRR slowness can reprogram the Route back to Nexthop which is link down.

This change is similar to sonic-net#3394 which was done for Nexthop Group.

Signed-off-by: Baorong Liu <96146196+baorliu@users.noreply.github.com>
baorliu pushed a commit to baorliu/sonic-swss that referenced this pull request Feb 23, 2026
…onic-net#3389)

* Added Change for given Route ECMP to  fallback on  Default Route ECMP  (sonic-net#3389)

What I did:
Added Change for given Route ECMP to fallback on Default Route ECMP. When all the Members of Route are Link Down and if route is eligible for fallback to default route the ECMP Member in SAI Nexthop Goup are updated to the Default Route Nexthop/Nexthop's Members.

This change does not take care of this scenarios:

When the Route which is fallback on Default Route Nexthops if the original nexthop become active [link comes up] it does not move back to original path. Reason is we except this should transient case as the Route which is fallback should get deleted once all the links are down

If Default Routes gets updated [BGP Updates] or if default Route nexthops become link down we do not update ECMP members of Routes that are already fallback to default. Again Reason being Route which is fallback should get deleted once all the links are down and is during this short window getting default routes update is very corner case. We can optimize if needed.

Signed-off-by: Baorong Liu <96146196+baorliu@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants