Orchagent changes in sonic-swss submodule to support NAT feature.#1125
Orchagent changes in sonic-swss submodule to support NAT feature.#1125rlhui merged 14 commits intosonic-net:masterfrom
Conversation
Signed-off-by: Akhilesh Samineni <[email protected]>
orchagent/natorch.cpp
Outdated
| sai_attribute_t attr; | ||
| memset(&attr, 0, sizeof(attr)); | ||
| attr.id = SAI_SWITCH_ATTR_AVAILABLE_SNAT_ENTRY; | ||
| maxAllowedNatEntries = 0; |
There was a problem hiding this comment.
please rename this variable as we are using this to check the number SNAT entries
There was a problem hiding this comment.
renamed this variable to maxAllowedSNatEntries.
| attr.id = SAI_SWITCH_ATTR_AVAILABLE_SNAT_ENTRY; | ||
| maxAllowedNatEntries = 0; | ||
|
|
||
| status = sai_switch_api->get_switch_attribute(gSwitchId, 1, &attr); |
There was a problem hiding this comment.
please move this SAI call to when NAT is enabled from CLI
There was a problem hiding this comment.
Get operation on a unimplemented SAI attribute doesn't result in crash, so retaining in current changes as discussed.
| if (m_natEntries.find(ip_address) != m_natEntries.end()) | ||
| { | ||
| SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing", | ||
| entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri |
There was a problem hiding this comment.
May be can do this check at the beginning of the function
There was a problem hiding this comment.
NAT entries have to be cached irrespective of NAT is enabled or not.
| if (m_natEntries.find(ip_address) != m_natEntries.end()) | ||
| { | ||
| SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing", | ||
| entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri |
There was a problem hiding this comment.
Why are counters being updated here ?
There was a problem hiding this comment.
The existing dynamic entry is replaced by static entry as Static entry has higher priority, so decremented the dynamic and incremented the static counters.
| if (m_natEntries.find(ip_address) != m_natEntries.end()) | ||
| { | ||
| SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing", | ||
| entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri |
There was a problem hiding this comment.
do we need to clean up the entries which are added to HW as well ?
There was a problem hiding this comment.
Yes, it will also delete the entries from hardware.
| if (m_natEntries.find(ip_address) != m_natEntries.end()) | ||
| { | ||
| SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing", | ||
| entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri |
There was a problem hiding this comment.
As I understand, this notification comes when a L3 interface is added to NAT ZONE, so every time a interface is added to a NAT_ZONE. I have 2 questions:
- since all the conntrack entries are flushed, the entries in asic will also be removed right ? What will happen to the traffic, they will be dropped ?
- When we need to flush of conntrack entries from natorch can this be done from natmgr instead ?
There was a problem hiding this comment.
This notification comes when do "sonic-clear nat translations" is executed. Yes all the entries in the asic are removed. Current traffic flows will see a drop and relearned again and added to asic.
OA is the component that tries to keep the Kernel conntrack entries and the hardware entries in sync. So, it is doing the flush entries.
| if (m_natEntries.find(ip_address) != m_natEntries.end()) | ||
| { | ||
| SWSS_LOG_INFO("Duplicate %s %s NAT entry with ip %s and it's translated ip %s, do nothing", | ||
| entry.entry_type.c_str(), entry.nat_type.c_str(), ip_address.to_stri |
There was a problem hiding this comment.
In the current SONiC architecture, the kernel entries are not updated/added/deleted from orchagent. Can you move this functionality to natmgr.
There was a problem hiding this comment.
OA is responsible for checking the hardware activity/hitbit of each of the entry in its cache. Since the cache is in OA code, it is the appropriate place to update the timeouts for the corresponding entries in the kernel.
There was a problem hiding this comment.
I understand OA is checking the hitbit, but in the current SONiC architecture the kernel entries are no update from OA. The mgr modules normally updates the kernel entries
There was a problem hiding this comment.
The natmgr has no idea about the hardware activity of nat entries, so it cannot update the entries timeouts in kernel, if natmgr has to update the timeouts, OA has to inform the natmgr through separate communication channel which is an overhead and has it issues with scaling.
There was a problem hiding this comment.
The NAT entries are not updated real time, polling for hit-bit happens every 30 seconds, so adding another hop will not have a big impact on the performance.
There was a problem hiding this comment.
As discussed offline, lets keep the current code for now. Let's open a issue to track this change. We can make the code changes in the next release.
| {"udld", SAI_HOSTIF_TRAP_TYPE_UDLD}, | ||
| {"bfd", SAI_HOSTIF_TRAP_TYPE_BFD}, | ||
| {"bfdv6", SAI_HOSTIF_TRAP_TYPE_BFDV6} | ||
| {"src_nat_miss", SAI_HOSTIF_TRAP_TYPE_SNAT_MISS}, |
There was a problem hiding this comment.
Can we make sure that these trap ids are created only on platform which support NAT
There was a problem hiding this comment.
Done based on the feature check.
orchagent/Makefile.am
Outdated
| orchagent_SOURCES += flex_counter/flex_counter_manager.cpp flex_counter/flex_counter_stat_manager.cpp | ||
| orchagent_SOURCES += debug_counter/debug_counter.cpp debug_counter/drop_counter.cpp | ||
|
|
||
|
|
There was a problem hiding this comment.
Please remove this extra line
orchagent/copporch.h
Outdated
| protected: | ||
| object_map m_trap_group_map; | ||
| bool enable_sflow_trap; | ||
| bool isNatSupported; |
There was a problem hiding this comment.
Please follow the same style for variables as in file
There was a problem hiding this comment.
Removed this variable and using 'gIsNatSupported' global variable.
orchagent/intfsorch.h
Outdated
|
|
||
| std::set<IpPrefix> getSubnetRoutes(); | ||
| std::map<string, uint32_t> m_nat_zone; | ||
| bool isNatSupported; |
There was a problem hiding this comment.
Removed this variable and using 'gIsNatSupported' global variable.
orchagent/intfsorch.cpp
Outdated
| } | ||
| } | ||
|
|
||
| void IntfsOrch::getNatSupportedInfo() |
There was a problem hiding this comment.
I see that you are repeating this same function in multiple places. Can you do this query one time, like in switchorch, and use a single API to check if NAT is supported.
There was a problem hiding this comment.
Ok. Moved this functionality to main.cpp file and updating the global variable 'gIsNatSupported'.
orchagent/intfsorch.cpp
Outdated
| m_nat_zone[alias] = nat_zone_id; | ||
| if (isNatSupported) | ||
| { | ||
| setRouterIntfsNatZoneId(port, nat_zone_id); |
There was a problem hiding this comment.
I don't understand this call. The router interface is created only in setIntf. So this API is always going to return true after its first check. You are actually setting the zone in as part of Line 808. Can you please check?
There was a problem hiding this comment.
User configures the new 'nat_zone' even after rif is created, this call is used to set the new 'nat_zone'.
There was a problem hiding this comment.
Got it but the set of RIF attribute is currently handled in setIntf. It is done for SUB_PORT currently but I think can be modified to set NAT zone for any RIF type. Another follow up is, why do we need m_nat_zone map seperate? Can this nat zone be part of the port structure, like m_rif_id?
There was a problem hiding this comment.
Setting NAT zone for Physical, Vlan and Lag interfaces only, so not handled in setIntf.
Yes, nat zone can be part of port structure, made changes to address it.
orchagent/intfsorch.cpp
Outdated
| attr.value.u32 = port.m_mtu; | ||
| attrs.push_back(attr); | ||
|
|
||
| if (isNatSupported) |
There was a problem hiding this comment.
Set operation on a unimplemented SAI attribute (SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID as of SAI 1.5) results in crash, to avoid it added this check here.
orchagent/intfsorch.cpp
Outdated
| attr.id = SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID; | ||
| if (m_nat_zone.find(port.m_alias) == m_nat_zone.end()) | ||
| { | ||
| attr.value.u32 = DEFAULT_NAT_ZONE_ID; |
There was a problem hiding this comment.
This is not required, its the SAI default
There was a problem hiding this comment.
Yes, removed it.
orchagent/intfsorch.cpp
Outdated
| else | ||
| { | ||
| attr.value.u32 = m_nat_zone[port.m_alias]; | ||
| } |
There was a problem hiding this comment.
You can revisit the check as if natzone is present in configuration, set the attribute.
There was a problem hiding this comment.
yes, modified the code to set the attribute if it is present in cache.
| sai_samplepacket_api_t* sai_samplepacket_api; | ||
| sai_debug_counter_api_t* sai_debug_counter_api; | ||
|
|
||
|
|
There was a problem hiding this comment.
Please remove this extra line
|
The current nexthop tracking in OA works on different SAI implementations (those which track and those which do not track the nexthop changes) though it will be sub-optimal on the SAI that tracks too. We can proceed as-is with the current OA changes till the SAI implementations converge. |
|
Retest this please. |
Signed-off-by: Akhilesh Samineni <[email protected]>
- Added OA and zone related changes.
Signed-off-by: Akhilesh Samineni <[email protected]>
Signed-off-by: Akhilesh Samineni <[email protected]>
Signed-off-by: Akhilesh Samineni <[email protected]>
Signed-off-by: Akhilesh Samineni <[email protected]>
Signed-off-by: Akhilesh Samineni <[email protected]>
Signed-off-by: Akhilesh Samineni <[email protected]>
Signed-off-by: Akhilesh Samineni <[email protected]>
Signed-off-by: Akhilesh Samineni <[email protected]>
Signed-off-by: Akhilesh Samineni <[email protected]>
Signed-off-by: Akhilesh Samineni <[email protected]>
Signed-off-by: Akhilesh Samineni <[email protected]>
Signed-off-by: Akhilesh Samineni <[email protected]>
e5acb9b to
286ca21
Compare
Signed-off-by: Akhilesh Samineni <[email protected]>
|
@AkhileshSamineni, @kirankella, |
5ad47af
@arlakshm Resolved merge conflicts. |
|
retest this please. |
|
Retest this please. |
2 similar comments
|
Retest this please. |
|
Retest this please. |
|
retest vs please |
|
retest vs please |
) Signed-off-by: Akhilesh Samineni [email protected]
) Signed-off-by: Akhilesh Samineni [email protected]
Add more dependencies to setup.py. Now that we're building as a wheel, we can add all dependent packages to make building and installation more robust, as there will no longer be a need to install them explicitly. Move sonic-config-engine from `install_requires` to `tests_require`, as it is only needed by the unit tests. Remove code to install fastentrypoints package if not installed. When building a Python 3 wheel using python3 setup.py bdist_wheel, we would receive a permissions error when calling easy_install.main(['fastentrypoints']), as with Python 3 it appears elevated permissions are required to install the package. This appears to be a behavior change between the Python 2 and Python 3 versions of setuptools. This needs to be explicitly installed before building the package. We are already installing in the SONiC slave container. Will update the sonic-utilities README to specify, also.
…nic-net#1125) Signed-off-by: Akhilesh Samineni [email protected]
Added orchagent and zone related changes.
Link to NAT HLD : https://github.com/Azure/SONiC/blob/master/doc/nat/nat_design_spec.md
Depends on:
sonic-swss : #1059 and #1126
sonic-swss-common : sonic-net/sonic-swss-common#304
sonic-linux-kernel : sonic-net/sonic-linux-kernel#100
sonic-sairedis : sonic-net/sonic-sairedis#519 and sonic-net/sonic-sairedis#546
Signed-off-by: Akhilesh Samineni [email protected]