-
Notifications
You must be signed in to change notification settings - Fork 1k
Add BGP Suppress FIB Pending test plan #7475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,173 @@ | ||
| # BGP Suppress FIB Pending Test Plan | ||
|
|
||
| ## Related documents | ||
|
|
||
| | **Document Name** | **Link** | | ||
| |-------------------|----------| | ||
| | BGP Suppress FIB Pending HLD | [https://github.com/stepanblyschak/SONiC/blob/bgp-suppress-fib-pending/doc/BGP/BGP-supress-fib-pending.md]| | ||
|
|
||
| ## Overview | ||
|
|
||
| As of today, SONiC BGP advertises learnt prefixes regardless of whether these prefixes were successfully programmed into ASIC. | ||
| While route programming failure is followed by orchagent crash and all services restart, even for successfully created routes there is a short period of time when the peer will be black holing traffic. | ||
| Also, in the following scenario, a credit loop occurs: | ||
| ###### Figure 1. Use case scenario | ||
|
|
||
|  | ||
|
|
||
| The problem with BGP programming occurs after the T1 switch is rebooted: | ||
| 1. First, the T1 FRR learns a default route from at least 1 T2 | ||
| 2. The T0 advertises it’s prefixes to T1 | ||
| 3. FRR advertises the prefixes to T2 without waiting for them to be programmed in the ASIC | ||
| 4. T2 starts forwarding traffic for prefixes not yet programmed, according to T1’s routing table, T1 sends it back to a default route – same T2 | ||
|
|
||
| To avoid that, the route programming has to be synchronous down to the ASIC to avoid credit loops. | ||
|
|
||
| ### Scope | ||
|
|
||
| The test is to verify the mechanism that allows BGP not to advertise routes that haven't been installed into ASIC yet. | ||
|
|
||
| ### Scale / Performance | ||
|
|
||
| No scale/performance test involved in this test plan | ||
|
|
||
| ### Related **DUT** CLI commands | ||
| Command to enable the feature: | ||
| ``` | ||
| admin@sonic:~$ sudo config suppress-fib-pending enabled | ||
| ``` | ||
| Command to disable the feature: | ||
| ``` | ||
| admin@sonic:~$ sudo config suppress-fib-pending disabled | ||
| ``` | ||
|
|
||
| ### Supported Topology | ||
| The tests will be supported on t1 topo. | ||
|
|
||
| ###### Figure 2. Logic Topology | ||
|  | ||
|
|
||
| ## Test cases | ||
| ### Test case # 1 - BGPv4 route suppress test | ||
| 1. Enable BGP suppress-fib-pending function at DUT | ||
| 2. Save configuration and execute one action randomly choosen from(__reboot__/__config reload__/__fast-reboot__/__warm-reboot__) | ||
| 3. Suspend orchagent process to simulate a delay | ||
| ``` | ||
| kill -SIGSTOP $(pidof orchagent) | ||
| ``` | ||
| 4. Announce BGP ipv4 prefixes to DUT from T0 VM by exabgp | ||
| 5. Make sure announced BGP routes are in __queued__ state in the DUT routing table | ||
| 6. Verify the routes are not announced to T2 VM peer | ||
| 7. Send traffic matching the prefixes and verify packets are not forwarded to T0 VM | ||
| 8. Restore orchagent process | ||
| ``` | ||
| kill -SIGCONT $(pidof orchagent) | ||
| ``` | ||
| 9. Make sure announced BGP routes are not in __queued__ state in the DUT routing table | ||
| 10. Make sure the routes are programmed in FIB by checking __offloaded__ flag in the DUT routing table | ||
| ``` | ||
| show ip route 1.1.1.0/24 json | ||
|
|
||
| { | ||
| "1.1.1.0/24": [ | ||
| { | ||
| "destSelected": true, | ||
| "distance": 20, | ||
| "installed": true, | ||
| "installedNexthopGroupId": 277, | ||
| "internalFlags": 264, | ||
| "internalNextHopActiveNum": 1, | ||
| "internalNextHopNum": 1, | ||
| "internalStatus": 80, | ||
| "metric": 0, | ||
| "nexthopGroupId": 277, | ||
| "nexthops": [ | ||
| { | ||
| "active": true, | ||
| "afi": "ipv4", | ||
| "fib": true, | ||
| "flags": 3, | ||
| "interfaceIndex": 17, | ||
| "interfaceName": "PortChannel1021", | ||
| "ip": "10.0.0.65", | ||
| "weight": 1 | ||
| } | ||
| ], | ||
| "offloaded": true, | ||
| "prefix": "1.1.1.0/24", | ||
| "prefixLen": 24, | ||
| "protocol": "bgp", | ||
| "selected": true, | ||
| "table": 254, | ||
| "uptime": "00:08:08", | ||
| "vrfId": 0, | ||
| "vrfName": "default" | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
| 11. Verify the routes are announced to T2 peer | ||
| 12. Send traffic matching the prefixes and verify packets are forwarded to T0 VM | ||
| 13. This test should cover __default__ and __user defined vrf__ | ||
|
|
||
| ### Test case # 2 - BGPv6 route suppress test | ||
| 1. Enable BGP suppress-fib-pending function at DUT | ||
| 2. Save configuration and execute one action randomly choosen from(__reboot__/__config reload__/__fast-reboot__/__warm-reboot__) | ||
| 3. Suspend orchagent process to simulate a delay | ||
| 4. Announce BGP ipv6 prefixes to DUT from T0 VM by exabgp | ||
| 5. Make sure announced BGP routes are in __queued__ state in the DUT routing table | ||
| 6. Verify the routes are not announced to T2 VM peer | ||
| 7. Send traffic matching the prefixes and verify packets are not forwarded to T0 VM | ||
| 8. Restore orchagent process | ||
| 9. Make sure announced BGP routes are not in __queued__ state in the DUT routing table | ||
| 10. Make sure the routes are programmed in FIB by checking __offloaded__ flag in the DUT routing table | ||
| 11. Verify the routes are announced to T2 peer | ||
| 12. Send traffic matching the prefixes and verify packets are forwarded to T0 VM | ||
| 13. This test should cover __default__ and __user defined vrf__ | ||
|
|
||
| ### Test case # 3 - Test BGP route without suppress | ||
| 1. No BGP suppress-fib-pending function configured at DUT | ||
| 2. Suspend orchagent process to simulate a delay | ||
| 3. Announce BGP prefixes to DUT from T0 VM by exabgp | ||
| 4. Make sure announced BGP routes are not in __queued__ state in the DUT routing table | ||
| 5. Verify the BGP routes are announced to T2 peer | ||
| 6. Restore orchagent process | ||
| 7. Make sure the routes are programmed in FIB by checking __offloaded__ flag in the DUT routing table | ||
StormLiangMS marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| 8. Send traffic matching the prefixes and verify packets are forwarded to T0 VM | ||
|
|
||
| ### Test case # 4 - Test BGP test work with suppress | ||
| 1. No BGP suppress-fib-pending function configured at DUT | ||
| 2. Run BGP test suite | ||
| 3. Make sure BGP tests are not affected | ||
| 4. Enable BGP suppress-fib-pending function at DUT | ||
| 5. Run BGP test suite | ||
| 6. Make sure BGP tests are not affected | ||
|
|
||
| ### Test case # 5 - Test BGP route suppress under negative operation | ||
| 1. Enable BGP suppress-fib-pending function at DUT | ||
| 2. Suspend orchagent process to simulate a delay | ||
| 3. Announce BGP prefixes to DUT from T0 VM by exabgp | ||
| 4. Execute __BGP session restart__ | ||
| 5. Verify BGP neighborships are established | ||
| 6. Make sure announced BGP routes are in __queued__ state in the DUT routing table | ||
| 7. Configure __static routes__ then redistribute to BGP | ||
| 8. Verify the __redistributed routes__ are in the DUT routing table | ||
| 9. Verify the routes are announced to T2 VM peer | ||
| 10. Send traffic matching the prefixes and verify packets are not forwarded to T0 VM | ||
| 11. Restore orchagent process | ||
| 12. Make sure announced BGP routes are not in queued state in the DUT routing table | ||
| 13. Make sure the routes are programmed in FIB by checking __offloaded__ flag in the DUT routing table | ||
| 14. Verify the BGP routes are announced to T2 peer | ||
| 15. Send traffic matching the prefixes and verify packets are forwarded to T0 VM | ||
|
|
||
| ### Test case # 6 - Test BGP route suppress in credit loops scenario | ||
| 1. No BGP suppress-fib-pending function configured at DUT | ||
| 2. Suspend orchagent process to simulate a delay | ||
| 3. Announce a default route to DUT from T2 VM | ||
| 4. Announce BGP prefixes to DUT from T0 VM by exabgp | ||
| 5. Verify the BGP routes are announced to T2 VM peer | ||
| 6. Send traffic matching the prefixes and verify packets are forwarded __back to T2 VM__ | ||
| 7. Enable BGP suppress-fib-pending function at DUT | ||
| 8. Restore orchagent process | ||
| 9. Make sure the routes are programmed in FIB by checking __offloaded__ flag in the DUT routing table | ||
| 10. Send traffic matching the prefixes and verify packets are forwarded to __T0 VM__ | ||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add stress and performance test?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the stress test, we have case 4 to enable bgp-suppress-fib-pending function and run all the community bgp cases, it's including stress and consistency test. For performance test, we have another team to cover it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @StormLiangMS, I checked the test_bgp_update_timer.py script, it's the existing test for validating dut performance on handling single bgp update(single route for one time and loop test for 5 routes).
I have two concerns on using exabgp and tcpdump to implement performance test:
Please share your insights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@echuawu I think it is ok to have separate PR for performance and stress ones, could you open an issue to track this ask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@StormLiangMS , thank you, and the issue has been created to track the new PR: https://github.com/sonic-net/sonic-buildimage/issues/15194
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@echuawu could you add this new PR info to the sonic-net/SONiC#1103 to make sure it get well tracked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @StormLiangMS , Sure done.