feat: add batch_mode support for bind_fp_ports and unbind_fp_ports#18790
Merged
bingwang-ms merged 1 commit intosonic-net:masterfrom Jun 18, 2025
Merged
Conversation
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
Author
|
hi @lolyu could you help reviewing this one please? Thank you |
lolyu
reviewed
Jun 6, 2025
Collaborator
lolyu
left a comment
There was a problem hiding this comment.
this is a nice!
could you please help triage on t0/t1/dualtor?
0da2f51 to
b4a8be8
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
b4a8be8 to
abd1383
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
abd1383 to
a88952b
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
a88952b to
586b829
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
586b829 to
ce342e7
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
Author
|
Hi @lolyu could you help reviewing again. I've added
|
Signed-off-by: Austin Pham <austinpham@microsoft.com> adjust logic Signed-off-by: Austin Pham <austinpham@microsoft.com> chore: set batchmode Signed-off-by: Austin Pham <austinpham@microsoft.com> add support for python2 Signed-off-by: Austin Pham <austinpham@microsoft.com> fix python2 Signed-off-by: Austin Pham <austinpham@microsoft.com>
ce342e7 to
e05b4b8
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
mssonicbld
pushed a commit
that referenced
this pull request
Jun 18, 2025
adjust logic chore: set batchmode add support for python2 fix python2 Signed-off-by: Austin Pham <austinpham@microsoft.com>
11 tasks
Contributor
Author
|
cherry-pick Azure/sonic-mgmt.msft#417 202503 |
bingwang-ms
added a commit
to Azure/sonic-mgmt.msft
that referenced
this pull request
Jun 19, 2025
… (#18790) (#417) Cherry-pick sonic-net/sonic-mgmt#18790 <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Summary: This PR add option to use batch_mode support for bind_fp_ports. Which improves the speed by 50% tested on 128 VM neighbor. Fixes # (issue) 32654908 ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [ ] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [ ] Test case improvement ### Back port request - [ ] 202205 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 ### Approach #### What is the motivation for this PR? When doing ovs flow creation, we're launching subprocess and waiting for each subprocess result before continue with the next call. This process is very inefficient even with the aid of multi-threading support. #### How did you do it? This PR change the behavior of multi-threading in the following way: 1. Batching all the ovs flow creation commands that needed to execute into a single file 2. Launch 1 process to call `ovs-ofctl` on the file using add-flows, put the process into queue for wait later and free the thread so that the same thread can be use to launch a different batch 3. In the end, main thread will wait for all the batches launched from process finished This PR also provide an options to opt in this feature #### How did you verify/test it? Verified on physical testbed with 128 VMs. Time deduction for the same settings of 8 Threads is reduced from 1 hour 30 minutes to 45 minutes average. The following is a sample of the same settings, same number of threads, with batch_mode enabled on renumber topology and unbind topology. We can see the majority of benefit in Renumber topology by batch the bind_fp_ports. Before ``` Wednesday 04 June 2025 03:12:40 +0000 (0:00:00.055) 1:43:10.602 ******** =============================================================================== vm_set : Renumber topology lt2-o128 to VMs. base vm = VM73166 -------- 4998.28s vm_set : Unbind topology lt2-o128 to VMs. base vm = VM73166 ----------- 557.45s vm_set : Kill exabgp and ptf_nn_agent processes in PTF container ------ 206.43s vm_set : Setup vlan port for vlan tunnel ------------------------------- 92.35s vm_set : Verify that exabgp processes for IPv4 are started ------------- 45.90s vm_set : Verify that exabgp processes for IPv6 are started ------------- 45.79s vm_set : Configure exabgp processes for IPv4 on PTF -------------------- 27.62s vm_set : configure exabgp processes for IPv6 on PTF -------------------- 26.66s vm_set : Stop ptf container ptf_vms73-2 -------------------------------- 16.49s vm_set : Run the "apt-get update" as a separate and retryable step ----- 14.26s vm_set : Create ptf container ptf_vms73-2 ------------------------------ 14.25s vm_set : Try to login into docker registry ----------------------------- 12.00s vm_set : Remove ptf container ptf_vms73-2 ------------------------------ 11.54s vm_set : Set ipv6 route max size of ptf_vms73-2 ------------------------ 11.13s vm_set : Enable ipv6 for docker container ptf_vms73-2 ------------------ 11.03s vm_set : Install necessary packages ------------------------------------ 10.58s vm_set : Announce routes ------------------------------------------------ 9.99s vm_set : Install necessary packages ------------------------------------- 9.09s vm_set : Stop PTF portchannel ------------------------------------------- 4.60s vm_set : Change PTF interface MAC addresses ----------------------------- 4.35s ``` **After** ``` Wednesday 04 June 2025 07:30:07 +0000 (0:00:00.069) 0:52:51.148 ******** =============================================================================== vm_set : Renumber topology lt2-o128 to VMs. base vm = VM73166 -------- 1980.45s vm_set : Unbind topology lt2-o128 to VMs. base vm = VM73166 ----------- 552.96s vm_set : Kill exabgp and ptf_nn_agent processes in PTF container ------ 206.56s vm_set : Setup vlan port for vlan tunnel ------------------------------ 108.52s vm_set : Verify that exabgp processes for IPv4 are started ------------- 45.76s vm_set : Verify that exabgp processes for IPv6 are started ------------- 45.45s vm_set : Configure exabgp processes for IPv4 on PTF -------------------- 27.49s vm_set : configure exabgp processes for IPv6 on PTF -------------------- 25.96s vm_set : Stop ptf container ptf_vms73-2 -------------------------------- 16.17s vm_set : Create ptf container ptf_vms73-2 ------------------------------ 15.02s vm_set : Remove ptf container ptf_vms73-2 ------------------------------ 12.38s vm_set : Set ipv6 route max size of ptf_vms73-2 ------------------------ 11.56s vm_set : Try to login into docker registry ----------------------------- 11.13s vm_set : Install necessary packages ------------------------------------ 10.50s vm_set : Install necessary packages ------------------------------------- 8.84s vm_set : Announce routes ------------------------------------------------ 7.81s vm_set : Run the "apt-get update" as a separate and retryable step ------ 6.45s vm_set : Add exabgpv6 supervisor config and start related processes ----- 4.74s vm_set : Change PTF interface MAC addresses ----------------------------- 4.67s vm_set : Stop PTF portchannel ------------------------------------------- 4.51s ``` # Other topology The only affected functionality are `renumber topology` and `unbind topology` | topology | no batch | batch| |-----------|--------|---------| |t0|| | |t1-64-lag||| |dualtor-120||| #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? -->
r12f
pushed a commit
to Azure/sonic-mgmt.msft
that referenced
this pull request
Aug 7, 2025
As PR sonic-net/sonic-mgmt#17647 and sonic-net/sonic-mgmt#18790 are not included in 202412, some parameters in `vm_topology.py` are not supported. So in this PR, we removed such unsupported parameters.
nissampa
pushed a commit
to nissampa/sonic-mgmt_dpu_test
that referenced
this pull request
Aug 7, 2025
adjust logic chore: set batchmode add support for python2 fix python2 Signed-off-by: Austin Pham <austinpham@microsoft.com>
Contributor
|
hi @auspham , do you mind to help create a manual pick to 202412 to solve the conflict? |
Contributor
Author
|
@r12f could you help to sign-off? Thank you Azure/sonic-mgmt.msft#636 |
Contributor
|
thanks! Kicked off CI and will follow up |
r12f
pushed a commit
to Azure/sonic-mgmt.msft
that referenced
this pull request
Aug 12, 2025
Cherry-pick sonic-net/sonic-mgmt#18790 Signed-off-by: Austin Pham <austinpham@microsoft.com>
opcoder0
pushed a commit
to opcoder0/sonic-mgmt
that referenced
this pull request
Dec 8, 2025
adjust logic chore: set batchmode add support for python2 fix python2 Signed-off-by: Austin Pham <austinpham@microsoft.com>
gshemesh2
pushed a commit
to gshemesh2/sonic-mgmt
that referenced
this pull request
Dec 16, 2025
adjust logic chore: set batchmode add support for python2 fix python2 Signed-off-by: Austin Pham <austinpham@microsoft.com> Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
AharonMalkin
pushed a commit
to AharonMalkin/sonic-mgmt
that referenced
this pull request
Dec 16, 2025
adjust logic chore: set batchmode add support for python2 fix python2 Signed-off-by: Austin Pham <austinpham@microsoft.com> Signed-off-by: Aharon Malkin <amalkin@nvidia.com>
gshemesh2
pushed a commit
to gshemesh2/sonic-mgmt
that referenced
this pull request
Dec 21, 2025
adjust logic chore: set batchmode add support for python2 fix python2 Signed-off-by: Austin Pham <austinpham@microsoft.com> Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
venu-nexthop
pushed a commit
to venu-nexthop/sonic-mgmt
that referenced
this pull request
Jan 13, 2026
adjust logic chore: set batchmode add support for python2 fix python2 Signed-off-by: Austin Pham <austinpham@microsoft.com>
gshemesh2
pushed a commit
to gshemesh2/sonic-mgmt
that referenced
this pull request
Jan 26, 2026
adjust logic chore: set batchmode add support for python2 fix python2 Signed-off-by: Austin Pham <austinpham@microsoft.com> Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
ytzur1
pushed a commit
to ytzur1/sonic-mgmt
that referenced
this pull request
Feb 2, 2026
adjust logic chore: set batchmode add support for python2 fix python2 Signed-off-by: Austin Pham <austinpham@microsoft.com> Signed-off-by: Yael Tzur <ytzur@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
Summary: This PR add option to use batch_mode support for bind_fp_ports. Which improves the speed by 50% tested on 128 VM neighbor.
Fixes # (issue) 32654908
Type of change
Back port request
Approach
What is the motivation for this PR?
When doing ovs flow creation, we're launching subprocess and waiting for each subprocess result before continue with the next call. This process is very inefficient even with the aid of multi-threading support.
How did you do it?
This PR change the behavior of multi-threading in the following way:
ovs-ofctlon the file using add-flows, put the process into queue for wait later and free the thread so that the same thread can be use to launch a different batchThis PR also provide an options to opt in this feature
How did you verify/test it?
Verified on physical testbed with 128 VMs. Time deduction for the same settings of 8 Threads is reduced from 1 hour 30 minutes to 45 minutes average.
The following is a sample of the same settings, same number of threads, with batch_mode enabled on renumber topology and unbind topology.
We can see the majority of benefit in Renumber topology by batch the bind_fp_ports.
Before
After
Other topology
The only affected functionality are
renumber topologyandunbind topologyAny platform specific information?
Supported testbed topology if it's a new test case?
Documentation