Skip to content

Restart tunnel_packet_handler if it's not running in test_tunnel_memory_leak.py#5334

Closed
ZhaohuiS wants to merge 1 commit intosonic-net:masterfrom
ZhaohuiS:tunnel_memory_leak_restart_service
Closed

Restart tunnel_packet_handler if it's not running in test_tunnel_memory_leak.py#5334
ZhaohuiS wants to merge 1 commit intosonic-net:masterfrom
ZhaohuiS:tunnel_memory_leak_restart_service

Conversation

@ZhaohuiS
Copy link
Copy Markdown
Contributor

@ZhaohuiS ZhaohuiS commented Mar 16, 2022

Description of PR

Summary:
Restart tunnel_packet_handler if it's not running in test_tunnel_memory_leak.py

Signed-off-by: Zhaohui Sun [email protected]

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 201911
  • 202012

Approach

What is the motivation for this PR?

Sometimes, test case will fail due to tunnel_packet_handler is not running.

How did you do it?

Restart tunnel_packet_handler service in swss if it's not running at first time.
If it's still not running, the test case will fail. But first try restart will rescue tunnel_packet_handler back most of time.

How did you verify/test it?

run tests/dualtor/test_tunnel_memory_leak.py

Any platform specific information?

dualtor

Supported testbed topology if it's a new test case?

Documentation

@ZhaohuiS ZhaohuiS requested a review from a team as a code owner March 16, 2022 06:22
@wangxin
Copy link
Copy Markdown
Collaborator

wangxin commented Mar 18, 2022

Why tunnel_packet_handler is not running in the first place? Can this change hide some real issue?

@ZhaohuiS
Copy link
Copy Markdown
Contributor Author

Why tunnel_packet_handler is not running in the first place? Can this change hide some real issue?

The most of failed traceback loos like the following log, @theasianpianist Are you aware of it? How to resolve this issue?
If this service can't restart by itself, probably, it's better not restart this service before this test case.

Mar 19 13:11:07.222915 str2-8102-03 INFO swss#tunnel_packet_handler.py: All portchannel intfs are established
Mar 19 13:11:07.223493 str2-8102-03 NOTICE swss#tunnel_packet_handler.py: Starting tunnel packet handler for host 10.1.0.32 and host 10.1.0.33
Mar 19 13:11:07.227139 str2-8102-03 INFO swss#tunnel_packet_handler.py: Listening on interfaces ['PortChannel102', 'PortChannel101', 'PortChannel104', 'PortChannel103']
Mar 19 13:11:07.238233 str2-8102-03 INFO swss#tunnel_packet_handler.py: PortChannel102 came back up, sniffer restart required
Mar 19 13:11:07.240465 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler Traceback (most recent call last):
Mar 19 13:11:07.240465 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/local/lib/python3.7/dist-packages/scapy/sendrecv.py", line 1017, in stop
Mar 19 13:11:07.240465 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     self.stop_cb()
Mar 19 13:11:07.240529 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler AttributeError: 'AsyncSniffer' object has no attribute 'stop_cb'
Mar 19 13:11:07.240529 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler 
Mar 19 13:11:07.240529 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler During handling of the above exception, another exception occurred:
Mar 19 13:11:07.240529 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler 
Mar 19 13:11:07.240560 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler Traceback (most recent call last):
Mar 19 13:11:07.240560 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/bin/tunnel_packet_handler.py", line 349, in <module>
Mar 19 13:11:07.240560 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     main()
Mar 19 13:11:07.240587 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/bin/tunnel_packet_handler.py", line 345, in main
Mar 19 13:11:07.240587 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     handler.run()
Mar 19 13:11:07.240632 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/bin/tunnel_packet_handler.py", line 339, in run
Mar 19 13:11:07.240632 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     self.listen_for_tunnel_pkts()
Mar 19 13:11:07.240652 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/bin/tunnel_packet_handler.py", line 322, in listen_for_tunnel_pkts
Mar 19 13:11:07.240667 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     sniffer.stop()
Mar 19 13:11:07.240680 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/local/lib/python3.7/dist-packages/scapy/sendrecv.py", line 1020, in stop
Mar 19 13:11:07.240680 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     "Unsupported (offline or unsupported socket)"
Mar 19 13:11:07.240718 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler scapy.error.Scapy_Exception: Unsupported (offline or unsupported socket)
Mar 19 13:11:07.310724 str2-8102-03 INFO swss#supervisord 2022-03-19 13:11:07,310 INFO exited: tunnel_packet_handler (exit status 1; not expected)
Mar 19 13:11:08.312678 str2-8102-03 INFO swss#supervisord 2022-03-19 13:11:08,311 INFO gave up: tunnel_packet_handler entered FATAL state, too many start retries too quickly

@theasianpianist
Copy link
Copy Markdown
Contributor

Why tunnel_packet_handler is not running in the first place? Can this change hide some real issue?

The most of failed traceback loos like the following log, @theasianpianist Are you aware of it? How to resolve this issue? If this service can't restart by itself, probably, it's better not restart this service before this test case.

Mar 19 13:11:07.222915 str2-8102-03 INFO swss#tunnel_packet_handler.py: All portchannel intfs are established
Mar 19 13:11:07.223493 str2-8102-03 NOTICE swss#tunnel_packet_handler.py: Starting tunnel packet handler for host 10.1.0.32 and host 10.1.0.33
Mar 19 13:11:07.227139 str2-8102-03 INFO swss#tunnel_packet_handler.py: Listening on interfaces ['PortChannel102', 'PortChannel101', 'PortChannel104', 'PortChannel103']
Mar 19 13:11:07.238233 str2-8102-03 INFO swss#tunnel_packet_handler.py: PortChannel102 came back up, sniffer restart required
Mar 19 13:11:07.240465 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler Traceback (most recent call last):
Mar 19 13:11:07.240465 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/local/lib/python3.7/dist-packages/scapy/sendrecv.py", line 1017, in stop
Mar 19 13:11:07.240465 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     self.stop_cb()
Mar 19 13:11:07.240529 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler AttributeError: 'AsyncSniffer' object has no attribute 'stop_cb'
Mar 19 13:11:07.240529 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler 
Mar 19 13:11:07.240529 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler During handling of the above exception, another exception occurred:
Mar 19 13:11:07.240529 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler 
Mar 19 13:11:07.240560 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler Traceback (most recent call last):
Mar 19 13:11:07.240560 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/bin/tunnel_packet_handler.py", line 349, in <module>
Mar 19 13:11:07.240560 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     main()
Mar 19 13:11:07.240587 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/bin/tunnel_packet_handler.py", line 345, in main
Mar 19 13:11:07.240587 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     handler.run()
Mar 19 13:11:07.240632 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/bin/tunnel_packet_handler.py", line 339, in run
Mar 19 13:11:07.240632 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     self.listen_for_tunnel_pkts()
Mar 19 13:11:07.240652 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/bin/tunnel_packet_handler.py", line 322, in listen_for_tunnel_pkts
Mar 19 13:11:07.240667 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     sniffer.stop()
Mar 19 13:11:07.240680 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/local/lib/python3.7/dist-packages/scapy/sendrecv.py", line 1020, in stop
Mar 19 13:11:07.240680 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     "Unsupported (offline or unsupported socket)"
Mar 19 13:11:07.240718 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler scapy.error.Scapy_Exception: Unsupported (offline or unsupported socket)
Mar 19 13:11:07.310724 str2-8102-03 INFO swss#supervisord 2022-03-19 13:11:07,310 INFO exited: tunnel_packet_handler (exit status 1; not expected)
Mar 19 13:11:08.312678 str2-8102-03 INFO swss#supervisord 2022-03-19 13:11:08,311 INFO gave up: tunnel_packet_handler entered FATAL state, too many start retries too quickly

I agree, we should not need to manually start this service, it should be up automatically. I will investigate, let's hold off on this PR for now.

@ZhaohuiS
Copy link
Copy Markdown
Contributor Author

Close this PR, thetunnel_pakcet_handlerservice should restart by itself.

@ZhaohuiS ZhaohuiS closed this Mar 24, 2022
@ZhaohuiS
Copy link
Copy Markdown
Contributor Author

The issue was fixed in sonic-net/sonic-buildimage#10346

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants