[test plan] Test plan for BGP scale test#15702
Conversation
…gp-high-scale-test-plan
|
/azp run |
|
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
|
/azp run |
|
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
4c1c003 to
b191e21
Compare
|
/azp run |
|
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
docs/testplan/BGP-Scale-Test.md
Outdated
| # Setup Configuration | ||
| The count of routes from BGP peers is vital, we will leverage exabpg to advertise routes to all BGP peers, and those routes be be advertised to device under test finally. | ||
|
|
||
| When DUT is T0, via exabgp, firstly, we will advertise 511 routes with prefix length 120 to all peer T1 devices for simulating downstream routes (VLAN IPv6 addresses of T0s), secondly, we will dvertise 15 routes with prefix length 64 to all peer T1 devices for simulating upstream routes (Aggregated IPv6 addresses of T0s' VLAN on T2s), finally, the DUT T0 will receive those routes from BGP peers. |
There was a problem hiding this comment.
it might be better to say - for each neighbor, we will advertise 1k routes in total: 512 /120 and 512 /128.
There was a problem hiding this comment.
we will skip the T2 ones here. they won't make difference but can cause a lot confusions.
There was a problem hiding this comment.
Because we have 1 /120 and 1/128 on T0 DUT, I think the routes count are 511 /120 plus 511 /128, right?
|
/azp run |
|
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
|
/azp run |
|
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
| Detail route scale is described in below table: | ||
| | Topology Type | BGP Routes Count | BGP Nexthop Group Count | BGP Nexthop Group Members Count | | ||
| | ------------------------------------------ | --------------------- | ----------------------- | ------------------------------- | | ||
| | t0-isolated-d2u254s1, t0-isolated-d2u254s2 | 254 * ( 511 + 511 ) | 254 | 254 | |
There was a problem hiding this comment.
The huge next hop count is not what the topology will provide by default, but the mgmt test cases would do. We should move them down to the mgmt test, but provide the default numbers here.
There was a problem hiding this comment.
Or we can make a new table showing the test as the requirement of Nexthop Group Member Scale Test.
There was a problem hiding this comment.
When we deploy testbed, the script will setup route by default, and there are parameters in topo like: podset_number, tor_number, tor_subnet_number to control the routes scale, so routes in this table is default for each topology.
| # Route Configuration Setup | ||
| The count of routes from BGP peers is vital, we will leverage exabpg to advertise routes to all BGP peers, and those routes be be advertised to device under test finally. | ||
|
|
||
| When DUT is T0, via exabgp, we will advertise 511 routes with prefix length 120 and 511 rotues with prefix length 128 to each neighbor T1 devices. The prefixes with length 120 are mocking VLAN address on downstream T0s, and the prefixes with length 128 are mocking loopback address on downstream T0s. |
There was a problem hiding this comment.
just to clarify my understanding of the text here.
when the DUT is a T0 - the expectation is that all of the T1 (emulated) are reflecting the same collection of /120 and /128 prefix announcements for a resulting prefix count on the T0 DUT of ~1022 prefixes spread over 256/512 NHs. correct?
|
/azp run |
|
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
docs/testplan/BGP-Scale-Test.md
Outdated
| ### Steps | ||
| 1. Shut down all ports on device. (shut down T1 sessions ports on T0 DUT, shut down T0 sesssions ports on T1 DUT.) | ||
| 1. Wait for routes are stable. | ||
| 1. Start and keep sending packets with all routes to all portes via ptf. |
|
/azp run |
|
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
|
/azp run |
|
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
|
Cherry-pick PR to msft-202412: Azure/sonic-mgmt.msft#259 |
…or sessions flapping, unisolation, nexthop group member change scenarios (sonic-net#258) <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Summary: Fixes # (issue) Implement test plan sonic-net#15702. Add test cases to test if control/data plane can handle the initialization/flapping of numerous BGP session holding a lot routes, and estimate the impact on it. ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [ ] Bug fix - [ ] Testbed and Framework(new/improvement) - [x] Test case(new/improvement) ### Back port request - [ ] 202012 - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 ### Approach #### What is the motivation for this PR? With numerous BGP sessions holding a lot routes, any flapping on BGP sessions or routes cloud have more overhead on device, we need test cases to verify the functionality and estimate convergence time, we publish this test plan. #### How did you do it? Implement sessions flapping test, unisolation test and nexthop group member scale test #### How did you verify/test it? #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? -->
What is the motivation for this PR? With numerous BGP sessions holding a lot routes, any flapping on BGP sessions or routes cloud have more overhead on device, to verify the functionality and estimate convergence time, we publish this test plan. How did you do it? Describe three test scenarios and introduce how we measure time in test. Signed-off-by: opcoder0 <110003254+opcoder0@users.noreply.github.com>
What is the motivation for this PR? With numerous BGP sessions holding a lot routes, any flapping on BGP sessions or routes cloud have more overhead on device, to verify the functionality and estimate convergence time, we publish this test plan. How did you do it? Describe three test scenarios and introduce how we measure time in test. Signed-off-by: Aharon Malkin <amalkin@nvidia.com>
What is the motivation for this PR? With numerous BGP sessions holding a lot routes, any flapping on BGP sessions or routes cloud have more overhead on device, to verify the functionality and estimate convergence time, we publish this test plan. How did you do it? Describe three test scenarios and introduce how we measure time in test. Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
What is the motivation for this PR? With numerous BGP sessions holding a lot routes, any flapping on BGP sessions or routes cloud have more overhead on device, to verify the functionality and estimate convergence time, we publish this test plan. How did you do it? Describe three test scenarios and introduce how we measure time in test. Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
Description of PR
Summary:
Fixes # (issue)
This test plan is to test if control/data plane can handle the initialization/flapping of numerous BGP session holding a lot routes, and estimate the impact on it.
Related PRs:
Type of change
Back port request
Approach
What is the motivation for this PR?
With numerous BGP sessions holding a lot routes, any flapping on BGP sessions or routes cloud have more overhead on device, to verify the functionality and estimate convergence time, we publish this test plan.
How did you do it?
Describe three test scenarios and introduce how we measure time in test.
How did you verify/test it?
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation