Skip to content

[202503] add batch_mode support for bind_fp_ports and unbind_fp_ports (#18790)#417

Merged
bingwang-ms merged 1 commit intoAzure:202503from
auspham:austinpham/18790-cherry-pick
Jun 19, 2025
Merged

[202503] add batch_mode support for bind_fp_ports and unbind_fp_ports (#18790)#417
bingwang-ms merged 1 commit intoAzure:202503from
auspham:austinpham/18790-cherry-pick

Conversation

@auspham
Copy link
Member

@auspham auspham commented Jun 19, 2025

Cherry-pick sonic-net/sonic-mgmt#18790

Description of PR

Summary: This PR add option to use batch_mode support for bind_fp_ports. Which improves the speed by 50% tested on 128 VM neighbor.

Fixes # (issue) 32654908

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Approach

What is the motivation for this PR?

When doing ovs flow creation, we're launching subprocess and waiting for each subprocess result before continue with the next call. This process is very inefficient even with the aid of multi-threading support.

How did you do it?

This PR change the behavior of multi-threading in the following way:

  1. Batching all the ovs flow creation commands that needed to execute into a single file
  2. Launch 1 process to call ovs-ofctl on the file using add-flows, put the process into queue for wait later and free the thread so that the same thread can be use to launch a different batch
  3. In the end, main thread will wait for all the batches launched from process finished

This PR also provide an options to opt in this feature

How did you verify/test it?

Verified on physical testbed with 128 VMs. Time deduction for the same settings of 8 Threads is reduced from 1 hour 30 minutes to 45 minutes average.

The following is a sample of the same settings, same number of threads, with batch_mode enabled on renumber topology and unbind topology.

We can see the majority of benefit in Renumber topology by batch the bind_fp_ports.

Before

Wednesday 04 June 2025  03:12:40 +0000 (0:00:00.055)       1:43:10.602 ********
===============================================================================
vm_set : Renumber topology lt2-o128 to VMs. base vm = VM73166 -------- 4998.28s
vm_set : Unbind topology lt2-o128 to VMs. base vm = VM73166 ----------- 557.45s
vm_set : Kill exabgp and ptf_nn_agent processes in PTF container ------ 206.43s
vm_set : Setup vlan port for vlan tunnel ------------------------------- 92.35s
vm_set : Verify that exabgp processes for IPv4 are started ------------- 45.90s
vm_set : Verify that exabgp processes for IPv6 are started ------------- 45.79s
vm_set : Configure exabgp processes for IPv4 on PTF -------------------- 27.62s
vm_set : configure exabgp processes for IPv6 on PTF -------------------- 26.66s
vm_set : Stop ptf container ptf_vms73-2 -------------------------------- 16.49s
vm_set : Run the "apt-get update" as a separate and retryable step ----- 14.26s
vm_set : Create ptf container ptf_vms73-2 ------------------------------ 14.25s
vm_set : Try to login into docker registry ----------------------------- 12.00s
vm_set : Remove ptf container ptf_vms73-2 ------------------------------ 11.54s
vm_set : Set ipv6 route max size of ptf_vms73-2 ------------------------ 11.13s
vm_set : Enable ipv6 for docker container ptf_vms73-2 ------------------ 11.03s
vm_set : Install necessary packages ------------------------------------ 10.58s
vm_set : Announce routes ------------------------------------------------ 9.99s
vm_set : Install necessary packages ------------------------------------- 9.09s
vm_set : Stop PTF portchannel ------------------------------------------- 4.60s
vm_set : Change PTF interface MAC addresses ----------------------------- 4.35s

After

Wednesday 04 June 2025  07:30:07 +0000 (0:00:00.069)       0:52:51.148 ********
===============================================================================
vm_set : Renumber topology lt2-o128 to VMs. base vm = VM73166 -------- 1980.45s
vm_set : Unbind topology lt2-o128 to VMs. base vm = VM73166 ----------- 552.96s
vm_set : Kill exabgp and ptf_nn_agent processes in PTF container ------ 206.56s
vm_set : Setup vlan port for vlan tunnel ------------------------------ 108.52s
vm_set : Verify that exabgp processes for IPv4 are started ------------- 45.76s
vm_set : Verify that exabgp processes for IPv6 are started ------------- 45.45s
vm_set : Configure exabgp processes for IPv4 on PTF -------------------- 27.49s
vm_set : configure exabgp processes for IPv6 on PTF -------------------- 25.96s
vm_set : Stop ptf container ptf_vms73-2 -------------------------------- 16.17s
vm_set : Create ptf container ptf_vms73-2 ------------------------------ 15.02s
vm_set : Remove ptf container ptf_vms73-2 ------------------------------ 12.38s
vm_set : Set ipv6 route max size of ptf_vms73-2 ------------------------ 11.56s
vm_set : Try to login into docker registry ----------------------------- 11.13s
vm_set : Install necessary packages ------------------------------------ 10.50s
vm_set : Install necessary packages ------------------------------------- 8.84s
vm_set : Announce routes ------------------------------------------------ 7.81s
vm_set : Run the "apt-get update" as a separate and retryable step ------ 6.45s
vm_set : Add exabgpv6 supervisor config and start related processes ----- 4.74s
vm_set : Change PTF interface MAC addresses ----------------------------- 4.67s
vm_set : Stop PTF portchannel ------------------------------------------- 4.51s

Other topology

The only affected functionality are renumber topology and unbind topology

topology no batch batch
t0 image image
t1-64-lag image image
dualtor-120 image image

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Signed-off-by: Austin Pham <austinpham@microsoft.com>

adjust logic

Signed-off-by: Austin Pham <austinpham@microsoft.com>

chore: set batchmode

Signed-off-by: Austin Pham <austinpham@microsoft.com>

add support for python2

Signed-off-by: Austin Pham <austinpham@microsoft.com>

fix python2

Signed-off-by: Austin Pham <austinpham@microsoft.com>
@auspham auspham changed the title [Cherry-pick 18790] add batch_mode support for bind_fp_ports and unbind_fp_ports [202503] add batch_mode support for bind_fp_ports and unbind_fp_ports (#18790) Jun 19, 2025
@bingwang-ms bingwang-ms merged commit eda0df2 into Azure:202503 Jun 19, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants