[fpmsyncd] Fpmsyncd Next Hop Table Enhancement#2919
[fpmsyncd] Fpmsyncd Next Hop Table Enhancement#2919prsunny merged 18 commits intosonic-net:masterfrom
Conversation
fpmsyncd/routesync.cpp
Outdated
| FieldValueTuple nhg("nexthop_group", nhg_id_key.c_str()); | ||
| fvVector.push_back(nhg); | ||
| updateNextHopGroup(nhg_id); | ||
| use_nhg = false; |
There was a problem hiding this comment.
This code was removed after the change to not use NHG for route with single nexthop.
fpmsyncd/routesync.h
Outdated
| /* nexthop group table */ | ||
| ProducerStateTable m_nexthop_groupTable; | ||
| map<uint32_t,NextHopGroup> m_nh_groups; | ||
| map<string,NextHopGroupRoute> m_nh_routes; |
There was a problem hiding this comment.
The fpmsyncd should NOT cache the routes, it will consume too much memory in large scale routes scenario.
For one route entry, it already exists in orchagent, sai meta, syncd(sai/sdk). We cannot afford another copying :(
There was a problem hiding this comment.
Removed map<string,NextHopGroupRoute> m_nh_routes; based on the discussion in the Routing WG.
Please kindly check if this has resolved your concern.
|
/AzurePipelines run Azure.sonic-swss |
|
Commenter does not have sufficient privileges for PR 2919 in repo sonic-net/sonic-swss |
|
@zice312963205 @shuaishang |
|
@nakano-omw your branch needs to be updated and you need to repush your code first. @prsunny Hi Prince, can you please help with this PR to merge. Thanks. |
|
@ridahanif96 I have repush. Thanks. |
|
Looks good |
|
reviewers, can you please help to review and approve this PR? Thanks. |
dgsudharsan
left a comment
There was a problem hiding this comment.
Please add UT to the changes. There is a requirement for 80% coverage.
fpmsyncd/routesync.h
Outdated
| #include <netlink/route/route.h> | ||
|
|
||
| #if (LINUX_VERSION_CODE > KERNEL_VERSION(5,3,0)) | ||
| #define HAVE_NEXTHOP_GROUP |
There was a problem hiding this comment.
Why do we need this macro HAVE_NEXTHOP_GROUP? Since we are submitting the code to master, the linux version is expected to be above 5,3,0. Please remove unnecessary ifdefs
There was a problem hiding this comment.
Thank you. I have removed the code.
fpmsyncd/routesync.cpp
Outdated
| else | ||
| #endif | ||
| { | ||
| onEvpnRouteMsg(h, len); |
There was a problem hiding this comment.
Have you tested if nexthop group works with EVPN in case of overlay nexthop?
fpmsyncd/routesync.cpp
Outdated
| char ifname_unknown[IFNAMSIZ] = "unknown"; | ||
|
|
||
| SWSS_LOG_INFO("type %d len %d", nlmsg_type, len); | ||
| if ((nlmsg_type != RTM_NEWNEXTHOP) |
There was a problem hiding this comment.
This is already checked in the calling function and hence redundant. Please remove here
https://github.com/sonic-net/sonic-swss/pull/2919/files#diff-0555c0a4f1e207c410ac8ab7d4a44f48a0925da2ed14c57499a4e9175223be57R625
There was a problem hiding this comment.
Thank you. I have removed the code.
fpmsyncd/routesync.cpp
Outdated
|
|
||
| #define ETHER_ADDR_STRLEN (3*ETH_ALEN) | ||
|
|
||
| #define MULTIPATH_NUM 256 //Same value used for FRR in SONiC |
There was a problem hiding this comment.
Can we rename as MAX_MULTIPATH_NUM for better readability?
There was a problem hiding this comment.
I have modified it to MAX_MULTIPATH_NUM.
fpmsyncd/routesync.cpp
Outdated
| auto itr = m_nh_groups.find(id); | ||
| if(itr == m_nh_groups.end()) | ||
| { | ||
| SWSS_LOG_INFO("NextHop group is incomplete: %d", nhg.id); |
There was a problem hiding this comment.
Shouldn't this be a warn or error log?
There was a problem hiding this comment.
I have corrected it to SWSS_LOG_ERROR.
fpmsyncd/routesync.cpp
Outdated
| auto git = m_nh_groups.find(nh_id); | ||
| if(git == m_nh_groups.end()) | ||
| { | ||
| SWSS_LOG_INFO("Nexthop not found: %d", nh_id); |
There was a problem hiding this comment.
Shouldn't this be a warn or error message?
There was a problem hiding this comment.
I have corrected it to SWSS_LOG_ERROR.
eddieruan-alibaba
left a comment
There was a problem hiding this comment.
Cherry picked into Phoenix Wing folk and validate it.
fpmsyncd/fpmlink.cpp
Outdated
| /* EVPN Type5 Add route processing */ | ||
| processRawMsg(nl_hdr); | ||
| } | ||
| #ifdef HAVE_NEXTHOP_GROUP |
There was a problem hiding this comment.
Why do we need this macro check? Can we do a dynamic check like
DEVICE_METADATA['localhost']['nexthop_group']
This is done as part of PR - https://github.com/sonic-net/sonic-buildimage/pull/16762/files
Could you pls check this?
fpmsyncd/routesync.cpp
Outdated
|
|
||
| vector<string> alsv = tokenize(intf_list, NHG_DELIMITER); | ||
| for (auto alias : alsv) | ||
| #ifdef HAVE_NEXTHOP_GROUP |
There was a problem hiding this comment.
Please check the previous comment and we could use device_metadata dynamic check.
| * up/down events. Skipping routes to eth0 or docker0 to avoid such behavior | ||
| */ | ||
| if (alias == "eth0" || alias == "docker0") | ||
| const auto itg = m_nh_groups.find(nhg_id); |
There was a problem hiding this comment.
Could you add couple of sonic-swss tests with NHGs?
|
@ntt-omw Please check the compiler failures routesync.cpp:800:23: error: 'rtnl_route_get_nh_id' was not declared in this scope; did you mean 'rtnl_route_get_iif'? |
|
@ntt-omw can you rebase your branch and trigger recompile? You need #3105 's changes to fix the compile issue @kperumalbfn pointed out. |
fpmsyncd/routesync.cpp
Outdated
| // In this case since we do not want the route with next hop on eth0/docker0, we return. | ||
| // But still we need to clear the route from the APPL_DB. Otherwise the APPL_DB and data | ||
| // path will be left with stale route entry | ||
| if(alsv.size() == 1) |
There was a problem hiding this comment.
Can't this test be moved outside the loop ? If the list is single entry then there is no reason for the loop itself.
There was a problem hiding this comment.
The purpose of this loop is not to show skipped routes, but to skip routes to specific interfaces (eth0 or docker0) and do the associated processing.
fpmsyncd/routesync.cpp
Outdated
| string weights = getNextHopWt(route_obj); | ||
|
|
||
| vector<string> alsv = tokenize(intf_list, NHG_DELIMITER); | ||
| for (auto alias : alsv) |
There was a problem hiding this comment.
What is the purpose of this loop? Is it to print the skipped routes ? Because the only logic there is when the list is of size 1.
There was a problem hiding this comment.
Thanks, I moved if(alsv.size() == 1)) outside the loop.
|
@dgsudharsan @kperumalbfn |
@ntt-omw can you help to get swss sanity check passed? Failed: 3 (0.35%) test_rebind_eni_route_group Might be related to your changes. |
|
@nakano-omw libnl 3.10 will have support for getting/setting the nexthop ID attribute, but the API is a little bit different. See thom311/libnl@3e08063 for details. It looks like in the version of code that has been committed, it's For ease of upgrades, it would be good if the same API syntax is used. Would you be able to rework this PR to use that new API instead? |
|
@ntt-omw @nakano-omw can you rebase your branch to latest master? You have "This branch is out-of-date with the base branch" |
|
We are fixing this issue.
|
|
We are rebasing our branch now.
|
|
Following lines are missing test coverage.. Coverage Threshold is 80%. |
| { | ||
| if(nhg.group.size() == 0) | ||
| { | ||
| if(!nhg.nexthop.empty()) |
There was a problem hiding this comment.
This can be replaced with the following:
nexthops = nhg.nexthop.empty() ? (af == AF_INET ? "0.0.0.0" : "::") : nhg.nexthop;
Similar logic is implemented in the non-empty nhg.
There was a problem hiding this comment.
same logic with one line
fpmsyncd/routesync.cpp
Outdated
| //Using route-table only for single next-hop | ||
| string nexthops, ifnames, weights; | ||
|
|
||
| getNextHopGroupFields(nhg, nexthops, ifnames, weights, rtnl_route_get_family(route_obj)); |
There was a problem hiding this comment.
Do you want to handle a case where the nhg is not based on the ID?
If there is a failure in getNextHopGroupFields(), you would be pushing empty strings.
There was a problem hiding this comment.
We need FRR to provide NHG ID. Otherwise, the NHG key would be changed during path changes.
There was a problem hiding this comment.
As eddie says, FRR is needed to provide NHG IDs. There is no case where nhg is not based on identity,
Signed-off-by: Kanji Nakano <[email protected]>
15b8a6d to
f017e8c
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@dgsudharsan The code has been corrected. |
|
@dgsudharsan can you help to review it again? @nakano-omw has already addressed your comments. @nakano-omw can you change the Sudharsan's review status to request for review? Currently, it is still "requested changes" |
fpmsyncd/routesync.cpp
Outdated
| } | ||
| } | ||
| } | ||
| if (grp_count > 0) |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@eddieruan-alibaba Still one of my comments is not addressed. Left a comment on how to address that |
|
Thanks @dgsudharsan . @nakano-omw do you want to take a look again? |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
efb6b50 to
a748926
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Kanji Nakano <[email protected]>
55553fb to
1534ffd
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
What I did Implementing code changes for sonic-net/SONiC#1425 Why I did it add nexthop group feature to fpmsyncd. How I verified it enable/disable nexthop group feature Klish will call REST API to configure feature next-hop-group enable. FEATURE|nexthop_group will be created in CONFIG_DB template zebra.conf.j2 will generate zebra.conf with fpm use-next-hop-groups if FEATURE|nexthop_group exists in CONFIG_DB. Else, it will generate zebra.conf with no fpm use-next-hop-groups (default behavior) Do config save comman and write to /etc/sonic/config_db.json restart SONiC: virsh reboot sonic-nhg /etc/frr/zebra.conf has fpm use-next-hop-groups instead of no fpm use-next-hop-groups
What I did Implementing code changes for sonic-net/SONiC#1425 Why I did it add nexthop group feature to fpmsyncd. How I verified it enable/disable nexthop group feature Klish will call REST API to configure feature next-hop-group enable. FEATURE|nexthop_group will be created in CONFIG_DB template zebra.conf.j2 will generate zebra.conf with fpm use-next-hop-groups if FEATURE|nexthop_group exists in CONFIG_DB. Else, it will generate zebra.conf with no fpm use-next-hop-groups (default behavior) Do config save comman and write to /etc/sonic/config_db.json restart SONiC: virsh reboot sonic-nhg /etc/frr/zebra.conf has fpm use-next-hop-groups instead of no fpm use-next-hop-groups Signed-off-by: Baorong Liu <[email protected]>

What I did
Implementing code changes for sonic-net/SONiC#1425
Why I did it
add nexthop group feature to fpmsyncd.
How I verified it
enable/disable nexthop group feature
feature next-hop-group enable.FEATURE|nexthop_groupwill be created inCONFIG_DBzebra.conf.j2will generatezebra.confwithfpm use-next-hop-groupsifFEATURE|nexthop_groupexists inCONFIG_DB. Else, it will generatezebra.confwithno fpm use-next-hop-groups(default behavior)config savecomman and write to/etc/sonic/config_db.jsonvirsh reboot sonic-nhg/etc/frr/zebra.confhasfpm use-next-hop-groupsinstead ofno fpm use-next-hop-groupsKlish CLI for feature nexthop_group
sonic(config)# feature next-hop-group enablesonic(config)# no feature next-hop-groupEnable
Disable
Details if related