diff --git a/doc/pic/bgp_pic_edge.md b/doc/pic/bgp_pic_edge.md
new file mode 100644
index 00000000000..84350a9c226
--- /dev/null
+++ b/doc/pic/bgp_pic_edge.md
@@ -0,0 +1,478 @@
+
+# BGP PIC HLD
+
+### Revision
+| Rev | Date | Author | Change Description |
+|:---:|:-----------:|:----------------------:|-----------------------------------|
+| 0.1 | Oct 8 2023 | Eddie Ruan / Lingyu Zhang | Initial Draft |
+
+
+## Table of Content
+- [Goal and Scope](#goal-and-scope)
+ - [BGP PIC Edge at high level](#bgp-pic-edge-at-high-level)
+ - [Current Linux Kernel Forwarding behavior](#current-linux-kernel-forwarding-behavior)
+- [High Level Design](#high-level-design)
+- [Zebra's Data Structure Modifications](#zebras-data-structure-modifications)
+ - [Exiting Struct nexthop](#exiting-struct-nexthop)
+ - [Updated data structure with BGP PIC changes](#updated-data-structure-with-bgp-pic-changes)
+ - [struct nhg\_hash\_entry](#struct-nhg_hash_entry)
+ - [struct dplane\_route\_info](#struct-dplane_route_info)
+ - [struct dplane\_neigh\_info](#struct-dplane_neigh_info)
+- [Zebra Modifications](#zebra-modifications)
+ - [BGP\_PIC enable flag](#bgp_pic-enable-flag)
+ - [Create pic\_nhe](#create-pic_nhe)
+ - [Handles kernel forwarding objects](#handles-kernel-forwarding-objects)
+ - [Handles FPM forwarding objects](#handles-fpm-forwarding-objects)
+ - [Map Zebra objects to APP\_DB via FPM](#map-zebra-objects-to-app_db-via-fpm)
+ - [How would pic\_nhg improve BGP convergence](#how-would-pic_nhg-improve-bgp-convergence)
+ - [SRv6 VPN SAI Objects](#srv6-vpn-sai-objects)
+ - [Map APP\_DB to SAI objects](#map-app_db-to-sai-objects)
+ - [Orchagent Modifications](#orchagent-modifications)
+- [Zebra handles NHG member down events](#zebra-handles-nhg-member-down-events)
+ - [Local link down events](#local-link-down-events)
+ - [BGP NH down events](#bgp-nh-down-events)
+- [Unit Test](#unit-test)
+ - [FRR Topotest](#frr-topotest)
+ - [SONiC mgmt test](#sonic-mgmt-test)
+ - [BGP\_PIC Traffic Verification Test](#bgp_pic-traffic-verification-test)
+ - [Test Bed for BGP PIC Test](#test-bed-for-bgp-pic-test)
+ - [Traffic Verification Test](#traffic-verification-test)
+ - [Test result without PIC](#test-result-without-pic)
+ - [Test result with PIC](#test-result-with-pic)
+ - [Recursive Traffic Verification Test](#recursive-traffic-verification-test)
+ - [Test Bed for Recursive Test](#test-bed-for-recursive-test)
+ - [Traffic Verification Test](#traffic-verification-test-1)
+- [References](#references)
+
+## Goal and Scope
+BGP PIC, as detailed in the RFC available at https://datatracker.ietf.org/doc/draft-ietf-rtgwg-bgp-pic/, addresses the enhancement of BGP route convergence. This document outlines a method to arrange forwarding structures that can lead to improved BGP route convergence. From architecture level, the same approach could enhance prefixes convergence for all VPN cases, MPLS VPN, SRv6 VPN and EVPN.
+
+The above draft offers two primary enhancements:
+
+- It prevents BGP load balancing updates from being triggered by IGP load balancing updates. This is essentially the recursive VPN route support, a.k.a PIC core case. This issue, which is discussed in the SONiC Routing Working Group (https://lists.sonicfoundation.dev/g/sonic-wg-routing/files/SRv6%20use%20case%20-%20Routing%20WG.pptx), can be effectively resolved. The recursive routes support would be detailed in a separate HLD.
+
+
+
+ Figure 1. Alibaba issue Underlay routes flap affecting Overlay SRv6 routes
+
+
+- We aim to achieve fast convergence in the event of a hardware forwarding failure related to a remote BGP PE becoming unreachable. Convergence in the slow path forwarding mode is not a priority. This is the main benefit for BGP PIC edge case which would be addressed in this HLD.
+
+For detail information about PIC core and PIC edge could be found at SONiC PIC architecture document, https://github.com/sonic-net/SONiC/blob/master/doc/pic/bgp_pic_arch_doc.md
+
+### BGP PIC Edge at high level
+
+
+ Figure 2. BGP PIC for improving remote BGP PE down event handling
+
+
+ - The provided graph illustrates that the BGP route 1.1.1.1 is advertised by both PE2 and PE3 to PE1. Each BGP route update message not only conveys the BGP next-hop information but also includes VPN context details.
+
+ - Consequently, when forming BGP Equal-Cost Multipath (ECMP) data structures, it is natural to retain both the BGP next-hop data and context information for each path. The VPN context could be specific to each prefix (a.k.a per prefix), individual customer edge (a.k.a per CE), or Virtual Routing and Forwarding (per VRF) type. This often leads to situations where BGP ECMP data structures cannot be effectively shared, as indicated in the lower-left section of the diagram. When a remote PE goes offline, PE1 must update all relevant BGP ECMP data structures, which can involve handling prefixes of varying lengths, resulting in an operation with a time complexity of O(N).
+
+ - The concept of the Prefix Independent Convergence's (PIC) proposal is to restructure this information by segregating the BGP next-hop information from the VPN context. The BGP next-hop-only information will constitute a new BGP ECMP structure that can be shared among all associated BGP VPN routes, as depicted in the lower-right part of the diagram. This approach allows for more efficient updates when the Interior Gateway Protocol (IGP) detects a BGP next-hop failure, resulting in an operation with a time complexity of O(1). This strategy aims to minimize traffic disruption in the hardware. The VPN context will be updated once BGP routes have reconverged.
+
+### Current Linux Kernel Forwarding behavior
+NHG in kernel is flat. BGP and IGP ECMP have to be collapsed in the same NHG. Currently, MPLS VPN, BGP EVPN are supported in Linux kernel. SRv6 VPN's support has not supported in.
+
+There are two aspects of view on adding routes in Linux.
+1. FRR is used as the routing stack for SONiC for white box switch / routers. For these devices, VPN routes may not be required to add to kernel since these devices normally are the middle man for VPN traffic. For SRv6 VPN case, we add a route map to skip Linux kernel programming for SRv6 VPN routes. We only keep underlay routes in kernel for routing protocols.
+2. For NFV type of services, linux kernel forwarding is their main forwarding resource.
+
+Current thought is to find a balance way to support both kinds of requirements.
+
+| Features | Linux Kernel |
+|:---:|:-----------:|
+| MPLS VPN | Flat |
+| EVPN | Hierarchy |
+| SRv6 VPN | No support |
+
+## High Level Design
+One of the challenges in implementing PIC within FRR is the absence of PIC support in the Linux kernel. To minimize alterations in FRR while enabling PIC on platforms that do not require Linux kernel support for this feature, we are primarily focused on two key modifications:
+1. In the 'zebra' component:
+ - Introduce a new Next Hop Group (PIC-NHG) specifically for the FORWARDING function. This NHG will serve as the shareable NHG in hardware.
+ - When a BGP next hop becomes unavailable, zebra will first update the new FORWARDING-ONLY NHG before BGP convergence takes place.
+ - When IGP updates, zebra will check associated BGP NHs' reachabilities. If the reachability of each individual member within the BGP NHG is not changed, there is no need to update the BGP NHG.
+ - Zebra will transmit two new forwarding objects, BGP PIC context, and NHG, to orchagent via FPM. The handling of NHG is outlined in https://github.com/sonic-net/SONiC/pull/1425.
+ - Zebra will continue to update kernel routes in the same manner as before, as the kernel does not support BGP PIC.
+2. In the orchagent component:
+ - Orchagent will be responsible for managing the two new forwarding objects, BGP PIC context, and NHG.
+
+## Zebra's Data Structure Modifications
+### Exiting Struct nexthop
+The existing zebra nexthop structure encompasses both FORWARDING details and certain route CONTEXT information, such as VNI (Virtual Network Identifier) and srv6_nh, for various VPN (Virtual Private Network) functionalities. Given that route CONTEXT may vary among different VPN routes, it is not feasible for VPN routes to share the current VPN nexthop group generated by the zebra nexthop structure. When a remote BGP peer becomes inactive, zebra is required to update all linked VPN nexthop groups, potentially involving a significant number of VPN routes.
+
+struct nexthop {
+
+ struct nexthop *next;
+ struct nexthop *prev;
+ /*
+ * What vrf is this nexthop associated with?
+ */
+ vrf_id_t vrf_id;
+ /* Interface index. */
+ ifindex_t ifindex;
+
+ enum nexthop_types_t type;
+
+ uint16_t flags;
+
+ /* Nexthop address */
+ union {
+ union g_addr gate;
+ enum blackhole_type bh_type;
+ };
+ union g_addr src;
+
+ ...
+
+ /* Encapsulation information. */
+ enum nh_encap_type nh_encap_type;
+ union {
+ vni_t vni;
+ } nh_encap;
+ /* EVPN router's MAC.
+ * Don't support multiple RMAC from the same VTEP yet, so it's not
+ * included in hash key.
+ */
+ struct ethaddr rmac;
+ /* SR-TE color used for matching SR-TE policies */
+ uint32_t srte_color;
+ /* SRv6 information */
+ struct nexthop_srv6 *nh_srv6;
+};
+
+The forwarding objects in zebra are organized as the following mannar. Each struct nexthop contains forwarding only part and route context part. Due to route context parts are route attributes, they may be different for different routes. Therefore, struct nexthop_group may not be sharable.
+
+
+ Figure 3. Existing Zebra forwarding objects
+
+
+### Updated data structure with BGP PIC changes
+Instead of dividing the current 'struct nexthop' into two separate structures, we have opted to utilize the same nexthop structure to store both the route context part (also known as PIC CONTEXT) and the forwarding part.
+
+Within the 'struct nhg_hash_entry,' we introduce a new field, 'struct nhg_hash_entry *pic_nhe.' This 'pic_nhe' is created when the original NHG pertains to BGP PIC. 'pic_nhe' points to an NHG that exclusively contains the original nexthop's forwarding part. The original nexthop retains both the PIC CONTEXT part and the forwarding part.
+
+This approach allows us to achieve the following objectives:
+- Utilize existing code for managing nexthop dependencies.
+- Maintain a consistent approach for dplane to handle updates to the Linux kernel.
+
+The new forwarding chain will be organized as follows.
+
+
+ Figure 4. Zebra forwarding objects after enabling BGP PIC
+
+
+### struct nhg_hash_entry
+As described in the previous section, we will add a new field struct nhg_hash_entry *pic_nhe in struct nhg_hash_entry, zebra_nhg.h
+
+ struct nhg_connected_tree_head nhg_depends, nhg_dependents;
+ struct nhg_hash_entry *pic_nhe;
+
+If PIC NHE is not used, pic_nhe would be set to NULL.
+
+### struct dplane_route_info
+dplane_route_info is in struct zebra_dplane_ctx
+
+We will add two new fields, zd_pic_nhg_id , zd_pic_ng. zd_pic_nhg_id is for pic_nhg's nh id, zd_pic_ng stores pic_nhg. These two new fields would be collected via dplane_ctx_route_init().
+
+ /* Nexthops */
+ uint32_t zd_nhg_id;
+ struct nexthop_group zd_ng;
+ /* PIC Nexthops */
+ uint32_t zd_pic_nhg_id;
+ struct nexthop_group zd_pic_ng;
+
+ These fields would be used in the following manner.
+
+| Cases | Linux Kernel Update (slow path) | FPM (fast path) |
+|:-----:|:------------------------------------:|:-----------------------------:|
+| No BGP PIC enabled | zd_ng is used as NHG | zd_ng is used as NHG |
+| BGP PIC enabled | zd_ng is used as NHG | zd_ng is used for PIC_CONTEXT, zd_pic_ng is used for NHG |
+
+### struct dplane_neigh_info
+This stucture is initialized via dplane_ctx_nexthop_init(), which is used to trigger NHG events. We don't need to make changes in this structure.
+
+## Zebra Modifications
+### BGP_PIC enable flag
+fpm_pic_nexthop flag would be set based on zebra's command line arguments and only on the platform which Linux kernel supports NHG, a.k.a kernel_nexthops_supported() returns true.
+
+### Create pic_nhe
+From dplane_nexthop_add(), when normal NHG is created, we will try to create PIC NHG as well. Currently, PIC NHG would be created for both BGP and IGP NHGs for all cases once pic is enable a.k.a fpm_pic_nexthop flag is set. We could skip IGP NHG's handling since the value is not big, although it is stated in the draft.
+
+ if (fpm_pic_nexthop && created && !pic) {
+ zebra_pic_nhe_find(&pic_nhe, *nhe, afi, from_dplane);
+ if (pic_nhe && pic_nhe != *nhe && ((*nhe)->pic_nhe) == NULL) {
+ (*nhe)->pic_nhe = pic_nhe;
+ zebra_nhg_increment_ref(pic_nhe);
+ SET_FLAG(pic_nhe->flags, NEXTHOP_GROUP_PIC_NHT);
+ }
+ }
+
+zebra_nhe_find() is used to create or find a NHE. In create case, when NHE is for BGP PIC and BGP_PIC is enabled, we use a similar API zebra_pic_nhe_find() to create a pic_nhe, a.ka. create nexthop with FORWARDING information only. The created pic_nhe would be stored in the new added field struct nhg_hash_entry *pic_nhe. The following code is a sample code which would be created in zebra_nhg.c.
+
+
+ bool zebra_pic_nhe_find(struct nhg_hash_entry **pic_nhe, /* return value */
+ struct nhg_hash_entry *nhe,
+ afi_t afi, bool from_dplane)
+ {
+ bool created = false;
+ struct nhg_hash_entry *picnhe;
+ struct nexthop *nh = NULL;
+ struct nhg_hash_entry pic_nh_lookup = {};
+ //struct nexthop *nexthop_tmp;
+ struct nexthop *pic_nexthop_tmp;
+ bool ret = 0;
+
+ if (nhe->pic_nhe) {
+ *pic_nhe = nhe->pic_nhe;
+ return false;
+ }
+ /* Use a temporary nhe to find pic nh */
+ pic_nh_lookup.type = nhe->type ? nhe->type : ZEBRA_ROUTE_NHG;
+ pic_nh_lookup.vrf_id = nhe->vrf_id;
+ /* the nhg.nexthop is sorted */
+ for (nh = nhe->nhg.nexthop; nh; nh = nh->next) {
+ if (nh->type == NEXTHOP_TYPE_IFINDEX)
+ continue;
+ pic_nexthop_tmp = nexthop_dup_no_context(nh, NULL);
+ ret = nexthop_group_add_sorted_nodup(&pic_nh_lookup.nhg, pic_nexthop_tmp);
+ if (!ret)
+ nexthop_free(pic_nexthop_tmp);
+ }
+ if (pic_nh_lookup.nhg.nexthop == NULL) {
+ *pic_nhe = NULL;
+ return false;
+ }
+
+ if (!zebra_nhg_dependents_is_empty(nhe) || pic_nh_lookup.nhg.nexthop->next) {
+ /* Groups can have all vrfs and AF's in them */
+ pic_nh_lookup.afi = AFI_UNSPEC;
+ } else {
+ switch (pic_nh_lookup.nhg.nexthop->type) {
+ case (NEXTHOP_TYPE_IFINDEX):
+ case (NEXTHOP_TYPE_BLACKHOLE):
+ /*
+ * This switch case handles setting the afi different
+ * for ipv4/v6 routes. Ifindex/blackhole nexthop
+ * objects cannot be ambiguous, they must be Address
+ * Family specific. If we get here, we will either use
+ * the AF of the route, or the one we got passed from
+ * here from the kernel.
+ */
+ pic_nh_lookup.afi = afi;
+ break;
+ case (NEXTHOP_TYPE_IPV4_IFINDEX):
+ case (NEXTHOP_TYPE_IPV4):
+ pic_nh_lookup.afi = AFI_IP;
+ break;
+ case (NEXTHOP_TYPE_IPV6_IFINDEX):
+ case (NEXTHOP_TYPE_IPV6):
+ pic_nh_lookup.afi = AFI_IP6;
+ break;
+ }
+ }
+
+ created = zebra_nhe_find(&picnhe, &pic_nh_lookup, NULL, afi, from_dplane, true);
+ *pic_nhe = picnhe;
+ if (pic_nh_lookup.nhg.nexthop)
+ nexthops_free(pic_nh_lookup.nhg.nexthop);
+ if (IS_ZEBRA_DEBUG_NHG_DETAIL)
+ zlog_debug("%s: create PIC nhe id %d for nhe %d",
+ __func__, picnhe->id, nhe->id);
+ return created;
+
+}
+
+This function is called by zebra_nhe_find() when pic_nhe is needed, but not gets created.
+
+ ...
+ done:
+ /* create pic_nexthop */
+ if (fpm_pic_nexthop && created && !pic) {
+ zebra_pic_nhe_find(&pic_nhe, *nhe, afi, from_dplane);
+ if (pic_nhe && pic_nhe != *nhe && ((*nhe)->pic_nhe) == NULL) {
+ (*nhe)->pic_nhe = pic_nhe;
+ zebra_nhg_increment_ref(pic_nhe);
+ SET_FLAG(pic_nhe->flags, NEXTHOP_GROUP_PIC_NHT);
+ }
+ }
+
+ /* Reset time since last update */
+ (*nhe)->uptime = monotime(NULL);
+ ...
+
+### Handles kernel forwarding objects
+There is no change for zebra to handle kernel forwarding objects. Only zg_ng is used for NHG programming in kernel.
+
+### Handles FPM forwarding objects
+#### Map Zebra objects to APP_DB via FPM
+When BGP_PIC is enabled, nhe's NHG would map to PIC_LIST, pic_nhe's NHG would map to forwarding NHG.
+Route object would use nhe's id as context id and use pic_nhe's id as NHG id.
+
+
+
+ Figure 5. Zebra maps forwarding objects to APP DB Objs when BGP PIC enables.
+
+
+The following talbe compares the number of forwarding objects created with and without PIC enabled. N is the number of VPN routes and assume all N VPN routes share the same forwarding only part which makes the discussion easy.
+| Forwarding Objects | No BGP PIC enabled | BGP PIC enabled |
+|:-----:|:------------------------------------:|:-----------------------------:|
+| Route | N | N |
+| NHG | N | 1 |
+| CONTEXT | n/a | N |
+
+Here is an example of the "show ip route"'s information, which shows two IDs. One ID is for NHG ID. The other ID is for PIC CONTEXT ID.
+
+
+ Figure 6. The output for show ip route.
+
+
+The following graph shows pic_nhe, i.e. NHE with forwarding only part. In hardwward forwarding, it is associated with the NHG ID shown in "show ip route".
+
+
+ Figure 7. PIC NHG
+
+
+The following graph shows normal NHE, which contains both forwarding information and context. In hardware forwarding, it is associated with the PIC CONTEXT ID shown in "show ip route".
+
+
+ Figure 8. Normal NHG.
+
+
+#### How would pic_nhg improve BGP convergence
+When IGP detects a BGP Nexthop is down, IGP would inform zebra on this route delete event. Zebra needs to make the following handlings.
+1. Find out all associated forwarding only nexthops resolved via this route. The nexthop lookup logic is similar to what it does in zebra_nhg_proto_add().
+2. Trigger a back walk from each impacted nexthop to all associated PIC NHG and reresolve each PIC NHG
+3. Update each PIC NHG in hardware. Sine PIC NHG is shared by VPN routes, it could quickly stop traffic loss before BGP reconvergence.
+4. BGP nexthop down event would lead to BGP reconvergence which would update CONTEXT properly later.
+
+#### SRv6 VPN SAI Objects
+Current use case is for SRv6 VPN. Therefore, we explicitly call out how to map fpm object to SRv6 VPN SAI objects. For MPLS VPN, EVPN use cases, the fpm to SAI object mapping would be added later once we have solid use cases in SONIC.
+
+The following diagram shows SAI objects related to SRv6. The detail information could be found at
+https://github.com/opencomputeproject/SAI/blob/master/doc/SAI-IPv6-Segment-Routing-VPN.md
+
+
+
+ Figure 9. SRv6 VPN SAI Objects
+
+
+#### Map APP_DB to SAI objects
+
+
+ Figure 10. APP DB to SAI OBJs mapping
+
+
+### Orchagent Modifications
+Handle two new forwarding objects from APP_DB, NEXTHOP_TABLE and PIC_CONTEXT_TABLE. Orchagent would map proper objects to the proper SAI objects.
+
+## Zebra handles NHG member down events
+### Local link down events
+https://github.com/sonic-net/SONiC/pull/1425 brings in NHG support, but it doesn't bring in how to trigger NHG update for various events. The recursive route support would be documented via a separate HLD document, since it is independent with BGP PIC approach.
+
+### BGP NH down events
+BGP NHG not reachable could be triggered from eBGP events or local link down events. We want zebra to backwalk all related BGP PIC NHG and update these NHG directly.
+1. After a routing update occurs, update the processing result of this route in rib_process_result and call zebra_rib_evaluate_rn_nexthops.
+
+2. In the zebra_rib_evaluate_rn_nexthops function, construct a pic_nhg_hash_entry based on rn->p and find the corresponding pic_nhg. Based on the dependents list stored in pic_nhg, find all other nexthop groups associated with the current nhg, and then remove the nexthop members in these nexthop groups.
+
+3. Trigger a refresh of pic_nhg to fpm.
+
+4. To ensure that nhg refresh messages can be triggered first, add prectxqueue in fpm_nl_ctx as a higher-priority downstream queue for fnc. When triggering nhg updates, attach the nhg's ctx to prectxqueue, and when refreshing fpm, prioritize getting ctx from prectxqueue for downstream.
+
+As shown in the following imageļ¼
+
+
+
+ Figure 11. BGP NH down event Handling
+
+
+When the route 2033::178, marked in blue, is deleted, find its corresponding nhg(68) based on 2033::178. Then, iterate through the Dependents list of nhg(68) and find the dependent nhg(67). Remove the nexthop member(2033::178) from nhg(67). After completing this action, trigger a refresh of nhg(67) to fpm.
+
+Similarly, when the route 1000::178, marked in brown, is deleted, find its corresponding nhg(66). Based on the dependents list of nhg(66), find nhg(95) and remove the nexthop member(1000::178) from nhg(95). After completing this action, trigger a refresh of nhg(95) to fpm.
+
+Notes: this would be share with the same backwalk infra for recursive routes handling.
+
+## Unit Test
+### FRR Topotest
+Add a new SRv6 VPN topotest test topology, and use fpm simulator to check fpm output with the following scenarios.
+1. Normal set up
+2. Simulate IGP NH down event via shutting down an IGP link
+3. Simulate BGP NH down event via shutting down remote BGP session.
+
+### SONiC mgmt test
+Add a new SRv6 VPN test in sonic_mgmt.
+
+### BGP_PIC Traffic Verification Test
+#### Test Bed for BGP PIC Test
+We utilize two physical devices to establish a logically configured 3 PE testbed. One physical device corresponds to a single PE (PE1), while the other physical device represents two PEs (PE2 and PE3) solely from a BGP perspective. PE1 is interconnected with both PE2 and PE3. IXIA serves as CEs, with one CE connected to PE1 and the other CE dual-homed to PE2 and PE3.
+
+The loopback address on PE2 is 1000::178, disseminated through an eBGP session between PE1 and PE2. Similarly, the loopback address on PE3 is 2000::178, published via an eBGP session between PE1 and PE3. PE1 acquires knowledge of CE2's SRv6 VPN routes through PE2 and PE3. Consequently, these VPN routes form an Equal-Cost Multipath (ECMP) on PE1, accessible via both PE2 and PE3.
+
+CE2 published 400K routes to PE1, while CE1 (IXIA) generates traffic with a destination address traversing through these 400K routes. The traffic is evenly distributed through PE2 and PE3 to reach CE2 (IXIA).
+
+
+ Figure 12. PIC Testbed via two physical devices
+
+
+#### Traffic Verification Test
+Our objective is to deactivate PE2 and assess the speed at which all traffic can be redirected to PE3 on PE1.
+
+
+ Figure 13. PIC Testbed when shut down PE2
+
+
+#### Test result without PIC
+Initially, we conduct this test without activating PIC. The traffic rate is expected to decrease by nearly half due to the absence of PE2. As PE1 sequentially updates 400K routes to eliminate the PE2 path, the traffic rate gradually recovers. The packet loss persists for approximately 1 minute during this process.
+
+
+ Figure 14. Packet loss lasts for about 1 minute
+
+
+#### Test result with PIC
+Subsequently, we activate PIC and replicate the identical test. Remarkably, there is virtually no continuous packet loss, and the overall count of lost packets stands at 4,516 (with a packet transmission rate of 1750MB/s). This translates to an optimized actual packet loss duration of approximately 2 milliseconds.
+
+
+ Figure 15. Packet loss for about 2ms
+
+
+| Cases | Traffic loss window |
+|:-----:|:------------------------------------:|
+| without PIC | About 60 second |
+| with PIC | about 2ms
+
+### Recursive Traffic Verification Test
+#### Test Bed for Recursive Test
+Different from previous test bed, We utilize two physical devices to establish a logically configured 2 PE testbed. Each physical device is one PE. There are two links conncected with these two PEs, (PE1 and PE3). IXIA serves as CEs, with one CE connected to PE1 and the other CE2 is connected to PE3.
+
+The loopback address on PE3 is 2000::178, disseminated through eBGP sessions between PE1 and PE3 via two links, a.k.a 2000::178 takes two IGP paths from PE1 to PE3. CE2 published 400K routes to PE1, while CE1 (IXIA) generates traffic with a destination address traversing through these 400K routes. The traffic is evenly distributed between two IGP paths between PE1 and PE3.
+
+
+
+ Figure 16. Recursive Testbed
+
+
+#### Traffic Verification Test
+Our objective is to deactivate one link between PE1 and PE3, and assess the speed at which all traffic should take the remaining link from PE1 to PE3.
+
+
+ Figure 17. Recursive Test : shut one IGP path
+
+
+By incorporating recursive routes support, we can streamline the transition from two paths to one path in the IGP NHG when a link-down notification is received. As the reachability of BGP NH remains unchanged, there is no need to update the BGP NHG or BGP routes. The transition period does not result in noticeable traffic loss.
+
+
+
+ Figure 18. Resurvie Test : no noticeable traffic loss
+
+
+**Note:** we may not be able to upstream this part as Cisco Silicon one's dataplane simulator has not been upstreamed to vSONiC yet.
+
+## References
+- https://datatracker.ietf.org/doc/draft-ietf-rtgwg-bgp-pic/
+- https://github.com/sonic-net/SONiC/blob/master/doc/pic/bgp_pic_arch_doc.md
+- https://github.com/opencomputeproject/SAI/blob/master/doc/SAI-IPv6-Segment-Routing-VPN.md
+- https://github.com/sonic-net/SONiC/pull/1425
+
+
diff --git a/doc/pic/images/BGP_NH_update.png b/doc/pic/images/BGP_NH_update.png
new file mode 100644
index 00000000000..22f969871cc
Binary files /dev/null and b/doc/pic/images/BGP_NH_update.png differ
diff --git a/doc/pic/images/app_db_to_sai.png b/doc/pic/images/app_db_to_sai.png
new file mode 100644
index 00000000000..586b4769805
Binary files /dev/null and b/doc/pic/images/app_db_to_sai.png differ
diff --git a/doc/pic/images/nhg_normal.png b/doc/pic/images/nhg_normal.png
new file mode 100644
index 00000000000..5b45e1216ae
Binary files /dev/null and b/doc/pic/images/nhg_normal.png differ
diff --git a/doc/pic/images/nhg_pic.png b/doc/pic/images/nhg_pic.png
new file mode 100644
index 00000000000..5e80195e573
Binary files /dev/null and b/doc/pic/images/nhg_pic.png differ
diff --git a/doc/pic/images/pic.png b/doc/pic/images/pic.png
new file mode 100644
index 00000000000..a484b5f40bb
Binary files /dev/null and b/doc/pic/images/pic.png differ
diff --git a/doc/pic/images/pic_testbed1.png b/doc/pic/images/pic_testbed1.png
new file mode 100644
index 00000000000..51a2d51aa43
Binary files /dev/null and b/doc/pic/images/pic_testbed1.png differ
diff --git a/doc/pic/images/pic_testbed2.png b/doc/pic/images/pic_testbed2.png
new file mode 100644
index 00000000000..427688034d8
Binary files /dev/null and b/doc/pic/images/pic_testbed2.png differ
diff --git a/doc/pic/images/picafter.png b/doc/pic/images/picafter.png
new file mode 100644
index 00000000000..4b96ef5252e
Binary files /dev/null and b/doc/pic/images/picafter.png differ
diff --git a/doc/pic/images/picbefore.png b/doc/pic/images/picbefore.png
new file mode 100644
index 00000000000..ec48543337a
Binary files /dev/null and b/doc/pic/images/picbefore.png differ
diff --git a/doc/pic/images/recursive1.png b/doc/pic/images/recursive1.png
new file mode 100644
index 00000000000..3d01be4b11b
Binary files /dev/null and b/doc/pic/images/recursive1.png differ
diff --git a/doc/pic/images/recursive2.png b/doc/pic/images/recursive2.png
new file mode 100644
index 00000000000..d51421b7f36
Binary files /dev/null and b/doc/pic/images/recursive2.png differ
diff --git a/doc/pic/images/recursive_result.png b/doc/pic/images/recursive_result.png
new file mode 100644
index 00000000000..288ecd0c27b
Binary files /dev/null and b/doc/pic/images/recursive_result.png differ
diff --git a/doc/pic/images/show_ip_route.png b/doc/pic/images/show_ip_route.png
new file mode 100644
index 00000000000..fa33e1e8f25
Binary files /dev/null and b/doc/pic/images/show_ip_route.png differ
diff --git a/doc/pic/images/srv6_igp2bgp.jpg b/doc/pic/images/srv6_igp2bgp.jpg
new file mode 100644
index 00000000000..38f8b916b31
Binary files /dev/null and b/doc/pic/images/srv6_igp2bgp.jpg differ
diff --git a/doc/pic/images/srv6_sai_objs.png b/doc/pic/images/srv6_sai_objs.png
new file mode 100644
index 00000000000..5e81790e3b4
Binary files /dev/null and b/doc/pic/images/srv6_sai_objs.png differ
diff --git a/doc/pic/images/zebra_fwding_obj_no_share.jpg b/doc/pic/images/zebra_fwding_obj_no_share.jpg
new file mode 100644
index 00000000000..c9490e213b4
Binary files /dev/null and b/doc/pic/images/zebra_fwding_obj_no_share.jpg differ
diff --git a/doc/pic/images/zebra_fwding_obj_sharing.jpg b/doc/pic/images/zebra_fwding_obj_sharing.jpg
new file mode 100644
index 00000000000..664c4b4940c
Binary files /dev/null and b/doc/pic/images/zebra_fwding_obj_sharing.jpg differ
diff --git a/doc/pic/images/zebra_map_to_fpm_objs.jpg b/doc/pic/images/zebra_map_to_fpm_objs.jpg
new file mode 100644
index 00000000000..c7ec12e9068
Binary files /dev/null and b/doc/pic/images/zebra_map_to_fpm_objs.jpg differ
diff --git a/doc/pic/images_recursive/alibaba_issue.png b/doc/pic/images_recursive/alibaba_issue.png
new file mode 100644
index 00000000000..a7e6cfe46c1
Binary files /dev/null and b/doc/pic/images_recursive/alibaba_issue.png differ
diff --git a/doc/pic/images_recursive/find_nhg_by_rnh.png b/doc/pic/images_recursive/find_nhg_by_rnh.png
new file mode 100644
index 00000000000..cc114194249
Binary files /dev/null and b/doc/pic/images_recursive/find_nhg_by_rnh.png differ
diff --git a/doc/pic/images_recursive/msft_issue.png b/doc/pic/images_recursive/msft_issue.png
new file mode 100644
index 00000000000..a337162f7ab
Binary files /dev/null and b/doc/pic/images_recursive/msft_issue.png differ
diff --git a/doc/pic/images_recursive/nhg_depend_update.png b/doc/pic/images_recursive/nhg_depend_update.png
new file mode 100644
index 00000000000..8e91613dfcc
Binary files /dev/null and b/doc/pic/images_recursive/nhg_depend_update.png differ
diff --git a/doc/pic/images_recursive/nhg_for_dataplane.png b/doc/pic/images_recursive/nhg_for_dataplane.png
new file mode 100644
index 00000000000..1c7c8899816
Binary files /dev/null and b/doc/pic/images_recursive/nhg_for_dataplane.png differ
diff --git a/doc/pic/images_recursive/nhg_initial_state.png b/doc/pic/images_recursive/nhg_initial_state.png
new file mode 100644
index 00000000000..18fa876120a
Binary files /dev/null and b/doc/pic/images_recursive/nhg_initial_state.png differ
diff --git a/doc/pic/images_recursive/nhg_removed_state.png b/doc/pic/images_recursive/nhg_removed_state.png
new file mode 100644
index 00000000000..09176233fb5
Binary files /dev/null and b/doc/pic/images_recursive/nhg_removed_state.png differ
diff --git a/doc/pic/images_recursive/nhg_status.png b/doc/pic/images_recursive/nhg_status.png
new file mode 100644
index 00000000000..e7f5b0b729b
Binary files /dev/null and b/doc/pic/images_recursive/nhg_status.png differ
diff --git a/doc/pic/images_recursive/parent_nhg_for_dataplane.png b/doc/pic/images_recursive/parent_nhg_for_dataplane.png
new file mode 100644
index 00000000000..5ee6b1c4434
Binary files /dev/null and b/doc/pic/images_recursive/parent_nhg_for_dataplane.png differ
diff --git a/doc/pic/images_recursive/path_remove.png b/doc/pic/images_recursive/path_remove.png
new file mode 100644
index 00000000000..a171e859bdd
Binary files /dev/null and b/doc/pic/images_recursive/path_remove.png differ
diff --git a/doc/pic/images_recursive/path_remove_pic.png b/doc/pic/images_recursive/path_remove_pic.png
new file mode 100644
index 00000000000..185bd19dff2
Binary files /dev/null and b/doc/pic/images_recursive/path_remove_pic.png differ
diff --git a/doc/pic/images_recursive/route_converge_original.png b/doc/pic/images_recursive/route_converge_original.png
new file mode 100644
index 00000000000..ffb65e6f7cd
Binary files /dev/null and b/doc/pic/images_recursive/route_converge_original.png differ
diff --git a/doc/pic/images_recursive/srv6_igp2bgp.png b/doc/pic/images_recursive/srv6_igp2bgp.png
new file mode 100644
index 00000000000..f2f4fa73108
Binary files /dev/null and b/doc/pic/images_recursive/srv6_igp2bgp.png differ
diff --git a/doc/pic/images_recursive/testcase1.png b/doc/pic/images_recursive/testcase1.png
new file mode 100644
index 00000000000..1a16bc8e1f5
Binary files /dev/null and b/doc/pic/images_recursive/testcase1.png differ
diff --git a/doc/pic/images_recursive/testcase2.png b/doc/pic/images_recursive/testcase2.png
new file mode 100644
index 00000000000..836c42e30e1
Binary files /dev/null and b/doc/pic/images_recursive/testcase2.png differ
diff --git a/doc/pic/images_recursive/testcase3.png b/doc/pic/images_recursive/testcase3.png
new file mode 100644
index 00000000000..49c66b0420a
Binary files /dev/null and b/doc/pic/images_recursive/testcase3.png differ
diff --git a/doc/pic/images_recursive/testcase4.png b/doc/pic/images_recursive/testcase4.png
new file mode 100644
index 00000000000..16d9662c2f8
Binary files /dev/null and b/doc/pic/images_recursive/testcase4.png differ
diff --git a/doc/pic/images_recursive/testcase5.png b/doc/pic/images_recursive/testcase5.png
new file mode 100644
index 00000000000..5ff1c89e2d5
Binary files /dev/null and b/doc/pic/images_recursive/testcase5.png differ
diff --git a/doc/pic/images_recursive/zebra_rnh_fixup_depends.png b/doc/pic/images_recursive/zebra_rnh_fixup_depends.png
new file mode 100644
index 00000000000..21930919a27
Binary files /dev/null and b/doc/pic/images_recursive/zebra_rnh_fixup_depends.png differ
diff --git a/doc/pic/recursive_route.md b/doc/pic/recursive_route.md
new file mode 100644
index 00000000000..c71489251f8
--- /dev/null
+++ b/doc/pic/recursive_route.md
@@ -0,0 +1,411 @@
+
+# Recursive Route Handling HLD
+
+## Revision
+| Rev | Date | Author | Change Description |
+|:---:|:-----------:|:----------------------------:|-----------------------------------|
+| 0.1 | Oct 2023 | Lingyu Zhang Alibaba / Yongxin Cao Accton | Initial Draft |
+
+
+## Table of Content
+- [Objective](#objective)
+- [Requirements Overview](#requirements-overview)
+- [Zebra Current Approach for Recursive Routes](#zebra-current-approach-for-recursive-routes)
+ - [Data Structure for Recursive Handling](#data-structure-for-recursive-handling)
+ - [Nexthop Hash Entry's Dependency Tree](#nexthop-hash-entrys-dependency-tree)
+ - [NHT List from Route Entry](#nht-list-from-route-entry)
+ - [Exiting Recursive Route Handling](#exiting-recursive-route-handling)
+- [High Level Design](#high-level-design)
+ - [Network Outage Events for Recursive Handling](#network-outage-events-for-recursive-handling)
+ - [Nexthop Fixup Handling](#nexthop-fixup-handling)
+ - [NHG ID](#nhg-id)
+ - [zebra\_rnh\_fixup\_depends()](#zebra_rnh_fixup_depends)
+ - [Underlay NHG handling](#underlay-nhg-handling)
+ - [Overlay NHG handling](#overlay-nhg-handling)
+ - [Throttle protocol client's route update events](#throttle-protocol-clients-route-update-events)
+ - [FPM and Orchagent Changes](#fpm-and-orchagent-changes)
+- [Unit Test](#unit-test)
+ - [Normal Case's Forwarding Chain Information](#normal-cases-forwarding-chain-information)
+ - [Test Case 1: local link failure](#test-case-1-local-link-failure)
+ - [Test Case 2: IGP remote link/node failure](#test-case-2-igp-remote-linknode-failure)
+ - [Test Case 3: IGP remote PE failure](#test-case-3-igp-remote-pe-failure)
+ - [Test Case 4: BGP remote PE node failure](#test-case-4-bgp-remote-pe-node-failure)
+ - [Test Case 5: Remote PE-CE link failure](#test-case-5-remote-pe-ce-link-failure)
+- [References](#references)
+
+## Objective
+The objective of this document is to minimize traffic loss windows on SONiC devices during network outages. Since SONiC doesn't have MPLS VPN support in master, the testing focus would be on EVPN and SRv6 VPN only.
+
+We follow the following design principals.
+1. Reuse existing FRR routes' convergence logic and make minimal changes for optimizations.
+2. Based on existing zebra information to make quick fixups to minimize traffic loss windows.
+
+## Requirements Overview
+Due to the Linux kernel lacks support for recursive routes, FRR zebra flattens the nexthop information when transferring recursive nexthop group information to dataplanes. Presently, when a path becomes unavailable, zebra notifies corresponding protocol processes, and let them to reissue route update events, which in turn update the forwarding chain of routes in the dataplane. This method contributes to the issue under discussion within the SONiC Routing Working Group.
+
+
+
+ Figure 1. Alibaba issue Underlay routes flap affecting Overlay SRv6 routes
+
+
+To solve this issue, we need to introduce Prefix Independent Convergence (PIC) to FRR/SONiC. PIC concept is described in IEFT https://datatracker.ietf.org/doc/draft-ietf-rtgwg-bgp-pic/. It is not a BGP feature, but a RIB/FIB feature on the device. PIC has two basic concepts, PIC core and PIC edge. We use the following two HLD for achieving PIC supports in FRR/SONiC.
+
+1. The following HLD focuses on PIC edge's enhancement https://github.com/eddieruan-alibaba/SONiC/blob/eruan-pic/doc/pic/bgp_pic_edge.md.
+2. This HLD outlines an approach which could minimize traffic loss windows on SONiC devices during network outages, a.k.a PIC core approach for the recursive VPN route supports.
+
+## Zebra Current Approach for Recursive Routes
+### Data Structure for Recursive Handling
+#### Nexthop Hash Entry's Dependency Tree
+nhg_hash_entry uses the following two fields *nhg_depends, *nhg_dependents to track NHG's dependencies. The usage is explained in the comments below. We plan to use nhg_dependents to perform needed backwalks.
+
+``` c
+/*
+ * Hashtables containing nhg entries is in `zebra_router`.
+ */
+struct nhg_hash_entry {
+ uint32_t id;
+ afi_t afi;
+ vrf_id_t vrf_id;
+
+ ...
+
+ /* Dependency trees for other entries.
+ * For instance a group with two
+ * nexthops will have two dependencies
+ * pointing to those nhg_hash_entries.
+ *
+ * Using a rb tree here to make lookups
+ * faster with ID's.
+ *
+ * nhg_depends the RB tree of entries that this
+ * group contains.
+ *
+ * nhg_dependents the RB tree of entries that
+ * this group is being used by
+ *
+ * NHG id 3 with nexthops id 1/2
+ * nhg(3)->nhg_depends has 1 and 2 in the tree
+ * nhg(3)->nhg_dependents is empty
+ *
+ * nhg(1)->nhg_depends is empty
+ * nhg(1)->nhg_dependents is 3 in the tree
+ *
+ * nhg(2)->nhg_depends is empty
+ * nhg(2)->nhg_dependents is 3 in the tree
+ */
+ struct nhg_connected_tree_head nhg_depends, nhg_dependents;
+
+ struct event *timer;
+```
+
+#### NHT List from Route Entry
+Each route entry (struct rib_dest_t) contains an nht field, which stores all nexthop addresses that get resolved via this route entry.
+
+``` c
+ /*
+ * The list of nht prefixes that have ended up
+ * depending on this route node.
+ * After route processing is returned from
+ * the data plane we will run evaluate_rnh
+ * on these prefixes.
+ */
+ struct rnh_list_head nht;
+```
+
+Each rnh entry maintains a list of protocol clients in struct list *client_list, who are interested in this nexthop's state change events. zebra_rnh_notify_protocol_clients() uses this list to inform nexthop change events to registered clients.
+
+``` c
+/* Nexthop structure. */
+struct rnh {
+ uint8_t flags;
+
+#define ZEBRA_NHT_CONNECTED 0x1
+#define ZEBRA_NHT_DELETED 0x2
+#define ZEBRA_NHT_RESOLVE_VIA_DEFAULT 0x4
+
+ ...
+
+ struct route_entry *state;
+ struct prefix resolved_route;
+ struct list *client_list;
+```
+
+
+### Exiting Recursive Route Handling
+The following diagram provides a brief description of zebra's current recursive convergence process.
+
+
+
+ Figure 2. FRR current route convergence processing work flow
+
+
+The handling of recursive routes occurs within the process of managing route updates. The function zebra_rib_evaluate_rn_nexthops() serves as the starting point for this process. Zebra begins by retrieving the NHT list from the targeted route entry. Subsequently, it iterates through each nexthop in the NHT list and calls zebra_evaluate_rnh() to assess the reachability of the nexthop. If the state of the nexthop changes, zebra utilizes zebra_rnh_notify_protocol_clients() to notify all clients to reissue corresponding routes to zebra. This results in a period of traffic loss until the routes are rebound with updated Next Hop Groups (NHGs). The duration of the traffic loss window increases proportionally with the number of routes.
+
+## High Level Design
+
+### Network Outage Events for Recursive Handling
+Here are a list of network outage events which we want to take care via recursive route handling. The goals are to minimize traffic loss window for these cases.
+
+| Types | Network Outage Events | Possible handling |
+|:---|:-----------|:----------------------|
+| Case 1: IGP local failure | A local link goes down | Currently Orchagent handles local link down event and triggers a quick fixup which removes the failed path in HW ECMP. Later zebra will be triggered from connected_down() handling. BGP may be informed to install a backup path if needed. This is a special PIC core case, a.k.a PIC local |
+| Case 2: IGP remote link/node failure | A remote link/node goes down, IGP leaf's reachability is not changed on the given PE, only IGP paths are updated. | IGP gets route withdraw events from IGP peer. Protocol client would inform zebra with updated paths.There are two possible triggers. One is that zebra would be triggered from zread_route_add() with updated path list. The other is that zebra would be informed via zread_route_del(). In the second case, the impacted nexthop could be able to resolve its corresponding RNH via a less specific prefix. It is the PIC core handling case. |
+| Case 3: IGP remote PE failure | A remote PE node is unreachable in IGP domain. | IGP triggers IGP leaf delete event. Zebra will be triggered via zread_route_del() and zebra can't resolve its corresponding BGP NH via less specific prefix. It is the PIC edge handling case |
+| Case 4: BGP remote PE node failure | BGP remote node down | It should be detected by IGP remote node down first before BGP reacts, a.k.a the same as the above step. This is the PIC edge handling case.|
+| Case 5: Remote PE-CE link failure | This is remote PE's PIC local case. | Remote PE will trigger PIC local handling for quick traffic fix up. Local PE will be updated after BGP gets informed. |
+
+### Nexthop Fixup Handling
+To streamline the discussion and ensure generality, we employ the following recursive routes as an illustration to demonstrate the workflow of the new fixup and its potential to reduce the traffic loss window.
+
+ B> 2.2.2.2/32 [200/0] (70) via 100.0.0.1 (recursive), weight 1, 00:11:28
+ * via 10.1.1.11, Ethernet1, weight 1, 00:11:28
+ * via 10.2.2.11, Ethernet2, weight 1, 00:11:28
+ * via 10.3.3.11, Ethernet3, weight 1, 00:11:28
+ via 200.0.0.1 (recursive), weight 1, 00:11:28
+ * via 10.4.4.12, Ethernet4, weight 1, 00:11:28
+ * via 10.5.5.12, Ethernet5, weight 1, 00:11:28
+ * via 10.6.6.12, Ethernet6, weight 1, 00:11:28
+ B> 3.3.3.3/32 [200/0] (70) via 100.0.0.1 (recursive), weight 1, 00:11:28
+ * via 10.1.1.11, Ethernet1, weight 1, 00:11:28
+ * via 10.2.2.11, Ethernet2, weight 1, 00:11:28
+ * via 10.3.3.11, Ethernet3, weight 1, 00:11:28
+ via 200.0.0.1 (recursive), weight 1, 00:11:28
+ * via 10.4.4.12, Ethernet4, weight 1, 00:11:28
+ * via 10.5.5.12, Ethernet5, weight 1, 00:11:28
+ * via 10.6.6.12, Ethernet6, weight 1, 00:11:28
+ B>* 100.0.0.0/24 [200/0] (51) via 10.1.1.11, Ethernet1, weight 1, 00:11:28
+ * via 10.2.2.11, Ethernet2, weight 1, 00:11:28
+ * via 10.3.3.11, Ethernet3, weight 1, 00:11:28
+ B>* 200.0.0.0/24 [200/0] (61) via 10.4.4.12, Ethernet4, weight 1, 00:11:28
+ * via 10.5.5.12, Ethernet5, weight 1, 00:11:28
+ * via 10.6.6.12, Ethernet6, weight 1, 00:11:28
+
+If one of the paths (path 10.6.6.12) for prefix 200.0.0.0/24 is removed, zebra will actively update two routes during the recursive convergence handling, facilitated by the BGP client. One route update pertains to 200.0.0.0/24, while the other update concerns 2.2.2.2/32 and 3.3.3.3/32. In this scenario, route 200.0.0.0/24 experiences the removal of one path, while the reachability of route 2.2.2.2/32 or 3.3.3.3/32 remains unaffected. To minimize the traffic loss window, it's essential to promptly address the affected nexthops in the dataplane before zebra completes its route convergence process.
+
+
+
+ Figure 3. The starting point for a path removal
+
+
+For achieving this quick fixup, we need the following changes.
+1. Change NHG ID hash method
+2. Add a new function zebra_rnh_fixup_depends() to make a quick fixup on involved NHGs in dataplanes.
+3. Use pic_nh for backwalking from underlay events to overlay NHGs
+4. Throttle protocol client's route update events if needed
+
+We will describe each change in detail in the following sections.
+
+#### NHG ID
+When zebra generates a hash entry for the recursive Next Hop Group (NHG), it presently utilizes both the nexthop addresses and their resolved nexthop addresses, along with additional information. The hash key is via nexthop_group_hash() shown below. Consequently, if the underlying paths undergo modifications, the recursive NHG must obtain a new NHG ID due to alterations in its key. This change in NHG ID results in the need for all routes to rebind with the new NHG, even if the recursive nexthops remain unchanged.
+
+
+``` c
+uint32_t nexthop_group_hash(const struct nexthop_group *nhg)
+{
+ struct nexthop *nh;
+ uint32_t key = 0;
+
+ for (ALL_NEXTHOPS_PTR(nhg, nh))
+ key = jhash_1word(nexthop_hash(nh), key);
+
+ return key;
+}
+```
+
+We want to extend the hash function below to all NHGs. This change allows us to reuse recursive NHG when the recursive nexthops are not changed. One benefit from this change is that protocol client could decide not to reissue all routes and recursive NHG could be reused.
+
+``` c
+uint32_t nexthop_group_hash_no_recurse(const struct nexthop_group *nhg)
+{
+ struct nexthop *nh;
+ uint32_t key = 0;
+
+ /*
+ * We are not interested in hashing over any recursively
+ * resolved nexthops
+ */
+ for (nh = nhg->nexthop; nh; nh = nh->next)
+ key = jhash_1word(nexthop_hash(nh), key);
+
+ return key;
+}
+```
+
+#### zebra_rnh_fixup_depends()
+
+This newly added function is inserted into the existing route convergence work flow, which enables zebra to make a quick fixup on involved nexthop groups before notifying the protocol client for route updates. This quick fixup is made based on current information available in zebra. The goal is to make some quick bandage fix when some path is known to be broken. Protocol clients may provide different paths later after routes are converged.
+
+
+
+ Figure 4. zebra_rnh_fixup_depends()
+
+
+The function marked in blue serves the quick fixup purpose. It gets triggered before the protocol clients get notified for routes updating. zebra_rnh_fixup_depends() is called from zebra_rnh_eval_nexthop_entry() after zebra decides that rnh's state is changed.
+
+``` c
+static void zebra_rnh_eval_nexthop_entry(struct zebra_vrf *zvrf, afi_t afi,
+ int force, struct route_node *nrn,
+ struct rnh *rnh,
+ struct route_node *prn,
+ struct route_entry *re)
+{
+ ...
+ zebra_rnh_remove_from_routing_table(rnh);
+ if (!prefix_same(&rnh->resolved_route, prn ? &prn->p : NULL)) {
+ if (prn)
+ prefix_copy(&rnh->resolved_route, &prn->p);
+ else {
+ int family = rnh->resolved_route.family;
+
+ memset(&rnh->resolved_route, 0, sizeof(struct prefix));
+ rnh->resolved_route.family = family;
+ }
+
+ copy_state(rnh, re, nrn);
+ state_changed = 1;
+ } else if (compare_state(re, rnh->state)) {
+ copy_state(rnh, re, nrn);
+ state_changed = 1;
+ }
+ zebra_rnh_store_in_routing_table(rnh);
+
+ if (state_changed || force) {
+ /*
+ * New added for dataplane quick fixup
+ */
+ zebra_rnh_fixup_depends(rnh);
+
+ zebra_rnh_notify_protocol_clients(zvrf, afi, nrn, rnh, prn,
+ rnh->state);
+ zebra_rnh_process_pseudowires(zvrf->vrf->vrf_id, rnh);
+ }
+}
+```
+The main work flow for **zebra_rnh_fixup_depends()** is the following
+1. Find the nexthop hash entry for this rnh
+2. Walk through this nexthop hash entry's nhg_dependents list and update each NHG involved in dataplane
+
+Note: this function would only walk one level up to NHG. The further level if any would be triggered via protocol clients' further route updates.
+
+#### Underlay NHG handling
+
+Assuming the initial state of EVPN underlay routes is the following
+
+
+
+ Figure 5. initial state of the routes
+
+
+After BGP learns 200.0.0.0/24's path 10.6.6.12 is withdrew, a.k.a case 2: IGP remote link/node failure, BGP would send a route update for 200.0.0.0/24 to zebra with two remaining paths. After zebra updates this route, it reaches the state shown in Figure 6.
+
+
+
+ Figure 6. one path is removed for route 200.0.0.0/24
+
+
+Zebra updates the route with new NHG 242 which has two paths, zebra sends the route update to dataplanes. This is the current approach, which would recover all traffic via route 200.0.0.0/24, which is shown in Figure 7.
+
+
+
+ Figure 7 Zebra updates the route with new NHG 242 which has two paths
+
+
+Then zebra walks through nht list of the route entry 200.0.0.0/24 and handle each rnh in the list via zebra_rnh_eval_nexthop_entry().
+
+
+ Figure 8 zebra walks through nht list of the route entry 200.0.0.0/24
+
+
+zebra_rnh_fixup_depends() would be triggered by zebra_rnh_eval_nexthop_entry() if rnh's state is changed. This function would use 200.0.0.1 to find out its corresponding nhg_hash_entry (NHG 71 in this example). From NHG 71, we back walk to all its dependent NHGs via NHG 71's *nhg_dependents list. At each dependent NHG (NHG 70 in this example), zebra performs a quick fixup to dataplanes via **zebra_rnh_fixup_depends(rnh);**. In this example, since rnh is resolved via 200.0.0.0/24 which has been updated to NHG 242, NHG 70 would update dataplanes with five paths. This quick fixup would help to stop traffic loss via these dependent NHGs and be independent from the number of routes pointing to them.
+
+
+
+ Figure 9 zebra performs a quick fixup to dataplanes
+
+
+After **zebra_rnh_fixup_depends()** is done, zebra continues its original processingļ¼calling zebra_rnh_notify_protocol_clients() to inform BGP that 200.0.0.1 as nexthop is changed.
+BGP triggers 2.2.2.2 and other routes updates which are via 200.0.0.1. During 2.2.2.2's zebra route handling, zebra would walk 2.2.2.2's rnh list if it is not empty.
+
+Notes:
+1. Although this illustrations are on IGP remote link/node failure case, the similar work flow could be applied to local link failure as well.
+2. The same logic and work flow could be applied to add paths to NHG, a.k.a **zebra_rnh_fixup_depends()** is a generic logic.
+
+#### Overlay NHG handling
+For SRV6 and EVPN routes, their next hops store specific VPN context information, such as vpn-sid and vni/label. These VPN contexts, combined with the next hop addresses, collectively compose the key for the nhg (Next Hop Group). This type of next hop is referred to as an overlay next hop group.
+
+One issue encountered with previous handling of underlay NHGs is the inability to locate the overlay NHE solely based on the route address. To address this issue, a separate additional PIC nexthop is now always created for the next hops of SRV6 and EVPN route types. This pic_nh exclusively contains the specific next hop address, with dependent relationships established for it similar to those for other underlay recursive-type next hops. Concurrently, an association is established between the pic_nh and the original overlay nh (referred to as PIC CONTEXT NHG).
+
+
+By employing this approach, we can utilize the same zebra_rnh_fixup_depends() function to promptly rectify overlay NHGs when underlay events affect the reachability of overlay nexthops.
+
+The new forwarding chain is depicted in the following graph.
+
+
+ Figure 10. The forwarding chain with overlay NHG
+
+
+Regardless of whether it's a PIC edge or PIC core scenario, NHG updates will always be initiated for PIC_NHGs first, then their corresponding PIC_CONTEXT_NHGs. Different NHG objects will be managed distinctively, as outlined in the table below.
+
+| Cases | SRv6 VPN | EVPN |
+|:------|:-----------|:----------------------|
+| PIC_NHG | Trigger dataplane update. This update would stop overlay SRv6 VPN traffic loss. | Trigger dataplane update. It would only stop traffic loss for underlay routes if any |
+| PIC_CONTEXT_NHG | Ignore the changes in fpm, no update needed. | Trigger dataplane update. This update would stop overlay EVPN traffic loss. |
+
+
+#### Throttle protocol client's route update events
+Zebra will always inform protocol clients that nexthop is changed. Protocol client could decide if it needs to throttle the corresponding routes update events if there is no changes in reachability and metrics. For SONiC, we will only consider BGP's handling.
+
+| Cases | Handling | Comments |
+|:---|:-----------|:----------------------|
+| Nexthop and routes are in the same global table | BGP will always reissue routes download | It could trigger fixup handling if there are recursive layers further |
+| Nexthop and routes are in the different table, and nexthops' reachabilities are changed. | BGP will always reissue routes download. | This is PIC edge case for updating VPN context properly. |
+| Nexthop and routes are in the different table. Nexthops' reachabilities are not changed and there is no metric change as well. | **BGP will skip reissue routes download.** | This is the PIC core case which we could throttle routes updating. |
+| Other cases | BGP will always reissue routes download. | safe net |
+
+
+### FPM and Orchagent Changes
+This approach relies on the following two changes for updating NHG in dataplane.
+1. Fpm needs to add a new schema to take each member as nexthop group ID and update APP DB. (Rely on BRCM and NTT's changes)
+2. Orchagent picks up event from APP DB and trigger nexthop group programming. Neighorch needs to handle this new schema without change too much on existing codes. (Rely on BRCM and NTT's changes)
+
+## Unit Test
+### Normal Case's Forwarding Chain Information
+### Test Case 1: local link failure
+
+
+ Figure 11.local link failure
+
+
+### Test Case 2: IGP remote link/node failure
+
+
+ Figure 12. IGP remote link/node failure
+
+
+
+### Test Case 3: IGP remote PE failure
+
+
+ Figure 13. IGP remote PE failure
+
+
+
+### Test Case 4: BGP remote PE node failure
+
+
+ Figure 14. BGP remote PE node failure
+
+
+
+### Test Case 5: Remote PE-CE link failure
+
+
+ Figure 15. Remote PE-CE link failure