Skip to content

SONiC switch High-scale IPv6 BGP test plan#16759

Closed
sm-xu wants to merge 24 commits intosonic-net:masterfrom
sm-xu:bgp-test-doc
Closed

SONiC switch High-scale IPv6 BGP test plan#16759
sm-xu wants to merge 24 commits intosonic-net:masterfrom
sm-xu:bgp-test-doc

Conversation

@sm-xu
Copy link
Copy Markdown
Contributor

@sm-xu sm-xu commented Feb 3, 2025

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405
  • 202411

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@sm-xu sm-xu requested review from wangxin and yxieca as code owners February 3, 2025 00:53
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@sm-xu sm-xu changed the title 2-tier network IPv6 BGP test plan Multi-tier network IPv6 BGP test plan Feb 6, 2025
@@ -0,0 +1,32 @@
# Test Objective
This test aims to verify the scalability and stability of 256 BGP sessions and 10K IPv6 routes in a 2-tier network. It evaluates the DUT’s ability to establish and maintain BGP sessions, ensures proper route learning, and measures BGP update convergence time under various conditions.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setup is 1 BGP session per port.

so we are not limited to 256 BGP sessions

@@ -0,0 +1,32 @@
# Test Objective
This test aims to verify the scalability and stability of 256 BGP sessions and 10K IPv6 routes in a 2-tier network. It evaluates the DUT’s ability to establish and maintain BGP sessions, ensures proper route learning, and measures BGP update convergence time under various conditions.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mutli-tier network should still work.

This test aims to verify the scalability and stability of 256 BGP sessions and 10K IPv6 routes in a 2-tier network. It evaluates the DUT’s ability to establish and maintain BGP sessions, ensures proper route learning, and measures BGP update convergence time under various conditions.

# Test Setup
![Test Setup](./2TierNetwork.png)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lint the markdown.

@@ -0,0 +1,32 @@
# Test Objective
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BGP + route scale

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, better split the route scale into another test. for bgp session, we can use 4 x Number of ports as route scale.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for route scale test, we can do 40 x number of ports. but in separate test. in the route scale test, we need to check the latency very carefully.

# Test Setup
![Test Setup](./2TierNetwork.png)

1. The testbed consists of four IXIA traffic generators (synchronized using a time-sync metronome) and five SONiC switches, where the BT1 switch is the Device Under Test (DUT).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test should work with any number of ixias.

5. The routing configuration of the BT0 switches should ensure that all data traffic go through the DUT.

# Test Steps
1. Assign a unique AS number to each of the five switches.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should consider automate this using add-topo or deploy-mg.

all switches can share the same setup, and ask the IXIA to advertise the routes.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@sm-xu sm-xu changed the title Multi-tier network IPv6 BGP test plan SONiC switch High-scale IPv6 BGP test plan Feb 11, 2025

2. Between each of the four neighboring switches and the DUT: Configure X/Y BGP sessions. Each BGP session should have a dedicated pair of Ethernet ports (one on the DUT and the other on the neighboring device) whose IPv6 addresses are on the same subnet. Set up the BGP neighbors, device neighbors, and port IPv6 addresses for each BGP session.

3. Monitor the BGP session establishment on the DUT using command “show ipv6 bgp summary”. Ensure all X BGP sessions are established without errors.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

command should be quoted by `


3. Monitor the BGP session establishment on the DUT using command “show ipv6 bgp summary”. Ensure all X BGP sessions are established without errors.

4. In each neighboring switch: Configure a vlan, assign 2500 IPv6 addresses with the specified prefix length and add all the Ethernet ports connected to IXIA to the vlan.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 2500 should be total number of port x10 (considering 1 port, 1 VLAN route)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


In the above example, the DUT has 256 logical Ethernet ports and is connected to 4 neighboring switches, we will establish 64 BGP sessions between each neighbor and the DUT.

## Test Steps
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have 3 test cases:

  1. All session shutdown/enable time
  2. 1 session shutdown/enable time
  3. Fragmented failure links / Nexthop Group Member Scale Test

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1st and 2nd cases are done. Working on the 3rd case...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-wrote the 3rd case.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

1. In one of the T0 switches, run `show ipv6 bgp network <ipv6>/<prefix>` and find the number of nexthops that can be used to reach `<ipv6>/<prefix>`.
2. Randomly pick half of the next hops and remove them. Run the show command again and record the convergence time.
3. Restore the removed nexthops and record the convergence time again.
4. Repeat this process and calculate the average convergence time of this scenario.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing the metrics definition here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Thank you!

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.


## Test Steps

1. Assign a unique AS number to each of the five switches.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ASN should be coming from topology. we should mention this in the test setup section as topology introduction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to test setup section.


1. Assign a unique AS number to each of the five switches.

2. Between each of the neighboring switches and the DUT: Configure X/Y BGP sessions. Each BGP session should have a dedicated pair of Ethernet ports (one on the DUT and the other on the neighboring device) whose IPv6 addresses are on the same subnet. Set up the BGP neighbors, device neighbors, and port IPv6 addresses for each BGP session.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with our current setup of multi-tier topology, we should not have any special setup for this one.

however, in the test setup, we can mention the approach we use to stress the BGP session on a device.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. with multiple T0 and a single T1 being used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revised both test plans. Please review.


2. Between each of the neighboring switches and the DUT: Configure X/Y BGP sessions. Each BGP session should have a dedicated pair of Ethernet ports (one on the DUT and the other on the neighboring device) whose IPv6 addresses are on the same subnet. Set up the BGP neighbors, device neighbors, and port IPv6 addresses for each BGP session.

3. Monitor the BGP session establishment on the DUT using command `show ipv6 bgp summary`. Ensure all X BGP sessions are established without errors.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this step should be the first pretest step.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


3. Monitor the BGP session establishment on the DUT using command `show ipv6 bgp summary`. Ensure all X BGP sessions are established without errors.

4. In each neighboring switch: Configure a vlan, assign `10*X` IPv6 addresses with the specified prefix length and add all the Ethernet ports connected to IXIA to the vlan.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might need to update this step with ixia advertising the routes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised.


## Key Test Cases

### One BGP Session Flap
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

### Case 1: ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


### One BGP Session Flap

1. One session down and up: Shut down one interface on the DUT. Wait till all routes advertised by the impacted BGP session are removed.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Port shutdown
Step 1: Start traffic
Step 2: Shutdown port
Step 3: Evaluate data path reaction time
Step 4: Evaluate route update time

repeat reverse for port startup

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to be precise on how to evaluate, which command to run and etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my revised version.


### All BGP Sessions Down and Up Test

1. Stop the BGP container on the DUT. Wait till all BGP routes are removed.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above. need to make it better and also we need to focus more on the data plane reaction time, since we have ixia in our testbed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check out my Teams message.


| Metric Name | Example Value |
| ----------------------------------------------- | ------------------- |
| `METRIC_NAME_BGP_CONVERGENCE_PORT_RESTART` | 15 |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test.bgp_scale.one_port_down.route_convergence_time_ms
test.bgp_scale.one_port_down.dp_response_time_ms
test.bgp_scale.all_port_down.route_convergence_time_ms
....

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convergency time is measured in seconds, not milliseconds. Using a suffix like *_s, might cause confusion about what that is.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@r12f
Copy link
Copy Markdown
Collaborator

r12f commented Jul 13, 2025

Close in favor of #19564

@r12f r12f closed this Jul 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants