Skip to content

Fix false traffic disruption from PF_PACKET sniffer drops#22708

Merged
StormLiangMS merged 2 commits intosonic-net:masterfrom
rajkumar1-arista:fix-false-traffic-disruption
Mar 25, 2026
Merged

Fix false traffic disruption from PF_PACKET sniffer drops#22708
StormLiangMS merged 2 commits intosonic-net:masterfrom
rajkumar1-arista:fix-false-traffic-disruption

Conversation

@rajkumar1-arista
Copy link
Contributor

Description of PR

Summary:
Fixes # #21043

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

dualtor_io.test_normal_op#test_mux_port_switch_active_server_to_active_server and dualtor_io.test_normal_op#test_mux_port_switch_active_server_to_standby_server are failing with 1 traffic disruptions

How did you do it?

For dualtor_sniffer script, #18758 introduces single PF_PACKET socket for all interfaces, corresponding SO_RCVBUF of only 128KB is not enough. During scapy's per-packet Python processing, brief pauses cause this small buffer to overflow, silently dropping packets. The test then misinterprets missing sniffer captures as forwarding disruptions.

Increased SO_RCVBUF to 32MB to absorb processing pauses.

How did you verify/test it?

Ran the tests on dualtor-aa-56

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

For dualtor_sniffer script, all interfaces are sharing one PF_PACKET socket with a default SO_RCVBUF of only 128KB. During scapy's
per-packet Python processing, brief pauses cause this small buffer to overflow, silently dropping packets. The test then
misinterprets missing sniffer captures as forwarding disruptions. Increase SO_RCVBUF to 32MB to absorb processing pauses.

Signed-off-by: rajkumar1 <rajkumar1@arista.com>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rajkumar1-arista
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rajkumar1-arista
Copy link
Contributor Author

@lolyu, @StormLiangMS can you please take a look.

@dhanasekar-arista
Copy link
Contributor

Check fails on bgp/test_bgp_update_replication.py test due to

------------------------------ live log teardown -------------------------------
06/03/2026 07:53:03 memory_utilization._handle_memory_thresh L0313 ERROR | [ALARM]: free:used memory usage increased by 685.0%, exceeds increase threshold 20%% (previous: 2897.0 MB, current: 3582.0 MB)

06/03/2026 07:53:03 memory_utilization._handle_memory_thresh L0313 ERROR | [ALARM]: docker:swss, Current memory usage 9.0% exceeds high threshold 8% (previous: 2.6%, current: 9.0%)
06/03/2026 07:53:03 memory_utilization._handle_memory_thresh L0313 ERROR | [ALARM]: docker:swss memory usage increased by 6.4%, exceeds increase threshold 3% (previous: 2.6%, current: 9.0%)
06/03/2026 07:53:03 init.pytest_runtest_teardown L0122 ERROR | Memory errors detected: [ALARM]: free:used memory usage increased by 685.0%, exceeds increase threshold 20%%

these memory threshold failures are not due the fix in this review.

@bingwang-ms
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Collaborator

@bingwang-ms bingwang-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

try:
self.ins.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF,
32 * 1024 * 1024)
except OSError:

Check notice

Code scanning / CodeQL

Empty except Note test

'except' clause does nothing but pass and there is no explanatory comment.
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@StormLiangMS StormLiangMS merged commit e3480b7 into sonic-net:master Mar 25, 2026
31 of 32 checks passed
@StormLiangMS StormLiangMS added Request for 202511 branch Request to backport a change to 202511 branch Approved for 202511 branch labels Mar 25, 2026
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Mar 25, 2026
…22708)

For dualtor_sniffer script, all interfaces are sharing one PF_PACKET socket with a default SO_RCVBUF of only 128KB. During scapy's
per-packet Python processing, brief pauses cause this small buffer to overflow, silently dropping packets. The test then
misinterprets missing sniffer captures as forwarding disruptions. Increase SO_RCVBUF to 32MB to absorb processing pauses.

Signed-off-by: rajkumar1 <rajkumar1@arista.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202511: #23308

vmittal-msft pushed a commit that referenced this pull request Mar 25, 2026
…23308)

For dualtor_sniffer script, all interfaces are sharing one PF_PACKET socket with a default SO_RCVBUF of only 128KB. During scapy's
per-packet Python processing, brief pauses cause this small buffer to overflow, silently dropping packets. The test then
misinterprets missing sniffer captures as forwarding disruptions. Increase SO_RCVBUF to 32MB to absorb processing pauses.

Signed-off-by: rajkumar1 <rajkumar1@arista.com>
Signed-off-by: mssonicbld <sonicbld@microsoft.com>
Co-authored-by: rajkumar1-arista <rajkumar1@arista.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants