Skip to content

[dualtor-io] fix dualtor sniffer start slow issue#18758

Merged
bingwang-ms merged 1 commit intosonic-net:masterfrom
lolyu:fix_dualtor_io_sniffer_slow
Jun 3, 2025
Merged

[dualtor-io] fix dualtor sniffer start slow issue#18758
bingwang-ms merged 1 commit intosonic-net:masterfrom
lolyu:fix_dualtor_io_sniffer_slow

Conversation

@lolyu
Copy link
Collaborator

@lolyu lolyu commented Jun 3, 2025

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Approach

What is the motivation for this PR?

Fix the dualtor io failure:

>       pytest_assert(len(failures) == 0, '\n' + '\n'.join(failures))
E       Failed:
E       Traffic on server 192.168.0.2 was disrupted prior to test start, missing 1 packets from the start of the packet flow
E       Traffic on server 192.168.0.4 was disrupted prior to test start, missing 1 packets from the start of the packet flow
E       Traffic on server 192.168.0.6 was disrupted prior to test start, missing 1 packets from the start of the packet flow

The issue is introduced by PR #18299.
The reason is that, the dualtor sniffer now listens to all dataplane interfaces on ptf (with prefix eth); for topo like dualtor-120, it has 128 dataplane interfaces, dualtor sniffer takes up to 10+ seconds to setup 128 sockets for those interfaces, which can takes up to 10+ seconds.

Signed-off-by: Longxiang Lyu lolv@microsoft.com

How did you do it?

As now scapy defaults to listen on conf.iface (on ptf, it is mgmt) and it will use socket.bind to bind to it, let's replace socket.bind with a dummy NOOP function, so the packet socket created will not listen to conf.iface anymore and it will capture packets from all interfaces.

How did you verify/test it?

On dualtor-120:

dualtor_io/test_normal_op.py::test_normal_op_downstream_upper_tor[active-standby]  PASSED                                                                                                                                                                               [100%]

========================================================================================================== 1 passed, 1 deselected, 2 warnings in 443.91s (0:07:23) ===========================================================================================================

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bingwang-ms
Copy link
Collaborator

Please double confirm it works when iface is empty. I remember I saw some issue earlier when using empty iface. The change looks good to me if it works

@lolyu
Copy link
Collaborator Author

lolyu commented Jun 3, 2025

Please double confirm it works when iface is empty. I remember I saw some issue earlier when using empty iface. The change looks good to me if it works

Yes, it is working if iface is empty.

root@de73a6e7fb07:~# python3
Python 3.7.3 (default, Mar 23 2024, 16:12:05)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from scapy.all import *
>>> packets = sniff()

^C>>> packets
<Sniffed: TCP:122 UDP:1 ICMP:65 Other:5>

Copy link
Contributor

@yyynini yyynini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bingwang-ms bingwang-ms merged commit ac56ff4 into sonic-net:master Jun 3, 2025
20 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Jun 3, 2025
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202411: #18776

mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Jun 4, 2025
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202505: #18783

bingwang-ms pushed a commit that referenced this pull request Jun 4, 2025
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Co-authored-by: Longxiang Lyu <35479537+lolyu@users.noreply.github.com>
mssonicbld pushed a commit that referenced this pull request Jun 4, 2025
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
sdszhang pushed a commit to sdszhang/sonic-mgmt that referenced this pull request Jun 14, 2025
Code sync sonic-net/sonic-mgmt:202411 => 202412

```
*   1f86dab (HEAD -> code-sync-202412, origin/code-sync-202412) r12f 250610:2314 - Merge remote-tracking branch 'base/202411' into code-sync-202412
|\
| * 2ba104e (base/202411) xwjiang-ms 250610:1604 - [202411] Use ceos 4.32.5M as default ceos image version (sonic-net#18878)
| * 5fa5cda Longxiang Lyu 250610:0818 - [202411][dualtor-aa] Add `dualtor_aa` support to `test_nvgre_hash` (sonic-net#18883)
| * afecbbf zitingguo-ms 250609:1334 - [Cherry-pick][ACL] Collect all upstream ports and Include service port into upstream neighbors in ACL tests (sonic-net#18847)
| * 2e4247b pragnya-arista 250609:0632 - [202411][sonic-mgmt]Fix decap/test_subnet_decap.py::test_vlan_subnet_decap (sonic-net#18778)
| * 0bfc7a8 Longxiang Lyu 250606:0943 - [dualtor-io] Fix duplication merge condition (sonic-net#18828)
| * 52d3771 Zhaohui Sun 250605:2046 - Restore configuration after vxlan module (sonic-net#18714)
| * 3d0922f Yaqiang Zhu 250605:2259 - [202411][pktgen] Skip test_pktgen in m0/mx/m1 (sonic-net#18822)
| * 3de20a6 Zhaohui Sun 250516:1325 - Add secondary subnet config for t0 topologies (sonic-net#18399)
| * bb3e0f9 Zhaohui Sun 250605:1416 - Xfail test_dir_bcast.py due to known issue on Broadcom platform (sonic-net#18787)
| * 158c562 Justin Wong 250604:2058 - Add snmp lldp state check after config_reload (sonic-net#18805)
| * e39c891 eyakubch 250605:0415 - bug: added fast reboot into reboot_type check (sonic-net#18551)
| * 8c7dd3b Cong Hou 250604:1433 - Remove the skip/xfail for the dualtor_io link failure test (sonic-net#18712)
| * 5dbc53d mssonicbld 250604:0952 - [dualtor-io] fix dualtor sniffer start slow issue (sonic-net#18758) (sonic-net#18776)
| * 105cdf6 StormLiangMS 250603:0844 - [CRM AVAILABLE] To enhance the crm tests for TD3 and Cisco devices (sonic-net#18733)
| * 4251b38 andywongarista 250601:1828 - Add restore_image fixture to test_multi_hop_upgrade_path (sonic-net#18230) (sonic-net#18532)
| * 85a55d8 Longxiang Lyu 250528:2313 - [dualtor] Fix `test_orchagent_slb` (sonic-net#18666)
| * 95a8764 Vivek Verma 250227:0656 - Fix fixture invocation order in qos_sai_base.py to prevent teardown failure. (sonic-net#17180)
| * 9a72265 Justin Wong 250514:1844 - Add PTF parameter for ceos neighbor lacp multiplier (sonic-net#18215)
| * cd1375d Longxiang Lyu 250529:1029 - [dualtor-io] Validate and recover active-active setup (sonic-net#18675)
| * 763c1b3 Longxiang Lyu 250528:2314 - [dualtor] Fix loganalyzer not exist issue (sonic-net#18674)
```
opcoder0 pushed a commit to opcoder0/sonic-mgmt that referenced this pull request Dec 8, 2025
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Dec 16, 2025
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Aharon Malkin <amalkin@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 21, 2025
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Jan 13, 2026
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Jan 26, 2026
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
ytzur1 pushed a commit to ytzur1/sonic-mgmt that referenced this pull request Feb 2, 2026
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Yael Tzur <ytzur@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants