update sonic-otairedis submoudle and otn linecard provisioning script#12
Merged
sonic-otn merged 1 commit intosonic-otn:otn_pre_202411from Dec 27, 2024
Merged
Conversation
Weitang-Zheng
pushed a commit
that referenced
this pull request
May 12, 2025
#### Why I did it To fix errors that happen when writing to the queue: ``` Jun 5 23:04:41.798613 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting... Jun 5 23:04:41.798985 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting... Jun 5 23:04:41.799535 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting... Jun 5 23:04:41.806010 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting... Jun 5 23:04:41.814075 r-leopard-56 ERR healthd: system_service[Errno 104] Connection reset by peer Jun 5 23:04:41.824135 r-leopard-56 ERR healthd: Traceback (most recent call last):#12 File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 484, in system_service#012 msg = self.myQ.get(timeout=QUEUE_TIMEOUT)#12 File "<string>", line 2, in get#012 File "/usr/lib/python3.9/multiprocessing/managers.py", line 809, in _callmethod#012 kind, result = conn.recv()#12 File "/usr/lib/python3.9/multiprocessing/connection.py", line 255, in recv#012 buf = self._recv_bytes()#12 File "/usr/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes#012 buf = self._recv(4)#12 File "/usr/lib/python3.9/multiprocessing/connection.py", line 384, in _recv#012 chunk = read(handle, remaining)#012ConnectionResetError: [Errno 104] Connection reset by peer Jun 5 23:04:41.826489 r-leopard-56 INFO healthd[8494]: ERROR:dbus.connection:Exception in handler for D-Bus signal: Jun 5 23:04:41.826591 r-leopard-56 INFO healthd[8494]: Traceback (most recent call last): Jun 5 23:04:41.826640 r-leopard-56 INFO healthd[8494]: File "/usr/lib/python3/dist-packages/dbus/connection.py", line 232, in maybe_handle_message Jun 5 23:04:41.826686 r-leopard-56 INFO healthd[8494]: self._handler(*args, **kwargs) Jun 5 23:04:41.826738 r-leopard-56 INFO healthd[8494]: File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 82, in on_job_removed Jun 5 23:04:41.826785 r-leopard-56 INFO healthd[8494]: self.task_notify(msg) Jun 5 23:04:41.826831 r-leopard-56 INFO healthd[8494]: File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 110, in task_notify Jun 5 23:04:41.826877 r-leopard-56 INFO healthd[8494]: self.task_queue.put(msg) Jun 5 23:04:41.826923 r-leopard-56 INFO healthd[8494]: File "<string>", line 2, in put Jun 5 23:04:41.826973 r-leopard-56 INFO healthd[8494]: File "/usr/lib/python3.9/multiprocessing/managers.py", line 808, in _callmethod Jun 5 23:04:41.827018 r-leopard-56 INFO healthd[8494]: conn.send((self._id, methodname, args, kwds)) Jun 5 23:04:41.827065 r-leopard-56 INFO healthd[8494]: File "/usr/lib/python3.9/multiprocessing/connection.py", line 211, in send Jun 5 23:04:41.827115 r-leopard-56 INFO healthd[8494]: self._send_bytes(_ForkingPickler.dumps(obj)) Jun 5 23:04:41.827158 r-leopard-56 INFO healthd[8494]: File "/usr/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes Jun 5 23:04:41.827199 r-leopard-56 INFO healthd[8494]: self._send(header + buf) Jun 5 23:04:41.827254 r-leopard-56 INFO healthd[8494]: File "/usr/lib/python3.9/multiprocessing/connection.py", line 373, in _send Jun 5 23:04:41.827322 r-leopard-56 INFO healthd[8494]: n = write(self._handle, buf) Jun 5 23:04:41.827368 r-leopard-56 INFO healthd[8494]: BrokenPipeError: [Errno 32] Broken pipe Jun 5 23:04:42.800216 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting... ``` When the multiprocessing.Manager is shutdown the queue will raise the above errors. This happens during shutdown - fast-reboot, warm-reboot. With the fix, system-health service does not hang: ``` root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped" Thu Oct 17 01:07:56 PM IDT 2024: Stopping... Thu Oct 17 01:07:58 PM IDT 2024: Stopped root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped" Thu Oct 17 01:08:13 PM IDT 2024: Stopping... Thu Oct 17 01:08:14 PM IDT 2024: Stopped root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped" Thu Oct 17 01:09:05 PM IDT 2024: Stopping... Thu Oct 17 01:09:06 PM IDT 2024: Stopped ``` ##### Work item tracking - Microsoft ADO **(number only)**: #### How I did it Remove the call to shutdown, the cleanup will happen automatically when GC runs as per documentation - https://docs.python.org/3/library/multiprocessing.html #### How to verify it <!-- If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012. --> Run warm-reboot, fast-reboot multiple times and verify no errors in the log. #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [x] 202205 - [x] 202311 - [x] 202405 #### Tested branch (Please provide the tested image version) <!-- - Please provide tested image version - e.g. - [x] 20201231.100 --> - [ ] <!-- image version 1 --> - [ ] <!-- image version 2 --> #### Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: --> <!-- Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU. --> #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged)
Weitang-Zheng
pushed a commit
that referenced
this pull request
May 12, 2025
…et#21095) Adding the below fix from FRR FRRouting/frr#17297 This is to fix the following crash which is a statistical issue [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'. Program terminated with signal SIGABRT, Aborted. #0 0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 [Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))] (gdb) bt #0 0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678 #4 0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352 #5 0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258 #6 route_next (node=<optimized out>) at ../lib/table.c:436 #7 route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410 #8 0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020") at ../zebra/interface.c:312 #9 0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867 #10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221 #11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810 #12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990 #13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198 #14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478
jjin62
pushed a commit
that referenced
this pull request
Oct 6, 2025
…et#21405) <!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "fixes #xxxx", or "closes #xxxx" or "resolves #xxxx" Please provide the following information: --> #### Why I did it Adding the below fix from FRR FRRouting/frr#17297 This is to fix the following crash which is a statistical issue ``` [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'. Program terminated with signal SIGABRT, Aborted. #0 0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 [Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))] (gdb) bt #0 0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678 #4 0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352 #5 0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258 #6 route_next (node=<optimized out>) at ../lib/table.c:436 #7 route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410 #8 0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020") at ../zebra/interface.c:312 #9 0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867 #10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221 #11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810 #12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990 #13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198 #14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478 ``` ##### Work item tracking - Microsoft ADO **(number only)**: #### How I did it Added patch. #### How to verify it Running BGP tests. <!-- If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012. --> #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [ ] 202205 - [ ] 202211 - [ ] 202305 #### Tested branch (Please provide the tested image version) <!-- - Please provide tested image version - e.g. - [x] 20201231.100 --> - [ ] <!-- image version 1 --> - [ ] <!-- image version 2 --> #### Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: --> <!-- Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU. --> #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged)
jjin62
pushed a commit
that referenced
this pull request
Oct 6, 2025
<!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "fixes #xxxx", or "closes #xxxx" or "resolves #xxxx" Please provide the following information: --> #### Why I did it During smartswitch initialization, an error is observed during switch bootup. ztp disable runs decode-eeprom. ``` sonic ERR decode-syseeprom: Failed to obtain EEPROM object due to ValueError("invalid literal for int() with base 10: ''"), Traceback: Traceback (most recent call last): #12 File "/usr/local/bin/decode-syseeprom", line 35, in instantiate_eeprom_object#012 eeprom = sonic_platform.platform.Platform().get_chassis().get_eeprom() #12 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ #12 File "/usr/local/lib/python3.11/dist-packages/sonic_platform/platform.py", line 35, in __init__ #12 self._chassis = SmartSwitchChassis()#12 ^^^^^^^^^^^^^^^^^^^^ #12 File "/usr/local/lib/python3.11/dist-packages/sonic_platform/chassis.py", line 1207, in __init__ #12 self.initialize_modules()#12 File "/usr/local/lib/python3.11/dist-packages/sonic_platform/chassis.py", line 1244, in initialize_modules #12 self.initialize_single_module(index=index) #12 File "/usr/local/lib/python3.11/dist-packages/sonic_platform/chassis.py", line 1235, in initialize_single_module #12 from .module import DpuModule#012 File "/usr/local/lib/python3.11/dist-packages/sonic_platform/module.py", line 24, in <module> #12 from .dpuctlplat import DpuCtlPlat, BootProgEnum #12 File "/usr/local/lib/python3.11/dist-packages/sonic_platform/dpuctlplat.py", line 29, in <module> #12 from .inotify_helper import InotifyHelper #12 File "/usr/local/lib/python3.11/dist-packages/sonic_platform/inotify_helper.py", line 21, in <module> #12 import inotify.adapters#012 File "/usr/local/lib/python3.11/dist-packages/inotify/adapters.py", line 37, in <module> #12 _IS_DEBUG = bool(int(os.environ.get('DEBUG', '0'))) #12 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ #012ValueError: invalid literal for int() with base 10: '' ``` Happens during ztp because, ztp sets DEBUG="" here https://github.com/sonic-net/sonic-ztp/blob/202411/src/etc/default/ztp#L6 #### How I did it Fixed the import in inotify #### How to verify it Verified by running decode-eeprom during init <!-- If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012. --> #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [ ] 202205 - [ ] 202211 - [ ] 202305 #### Tested branch (Please provide the tested image version) <!-- - Please provide tested image version - e.g. - [x] 20201231.100 --> - [ ] <!-- image version 1 --> - [ ] <!-- image version 2 --> #### Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: --> <!-- Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU. --> #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged)
jjin62
pushed a commit
that referenced
this pull request
Oct 6, 2025
…tener (sonic-net#23419) Why I did it It found the following KeyError in syslog, not only for lldp, but also for snmp and bgp. 2025 Jul 19 18:13:00.240397 vlab-01 ERR lldp#supervisor-proc-exit-listener: Exception: 'len', trace: Traceback (most recent call last): File "/usr/bin/supervisor-proc-exit-listener", line 249, in <module> main(sys.argv[1:]) File "/usr/bin/supervisor-proc-exit-listener", line 182, in main payload = sys.stdin.read(int(headers['len'])) KeyError: 'len' The context syslog is: 2025 Jul 19 18:12:59.505711 vlab-01 INFO lldp#supervisord 2025-07-19 18:12:59,504 INFO waiting for supervisor-proc-exit-listener, rsyslogd, lldpd, lldp-syncd, lldpmgrd to die 2025 Jul 19 18:12:59.761223 vlab-01 INFO containerd[684]: time="2025-07-19T18:12:59.759992163Z" level=info msg="shim disconnected" id=cd6e41a2cc82aae25d2d65801984943311b3f025c98ca865ea79be95194abc95 2025 Jul 19 18:12:59.762463 vlab-01 INFO containerd[684]: time="2025-07-19T18:12:59.760103279Z" level=warning msg="cleaning up after shim disconnected" id=cd6e41a2cc82aae25d2d65801984943311b3f025c98ca865ea79be95194abc95 namespace=moby 2025 Jul 19 18:12:59.765745 vlab-01 INFO containerd[684]: time="2025-07-19T18:12:59.760116062Z" level=info msg="cleaning up dead shim" 2025 Jul 19 18:12:59.767134 vlab-01 INFO dockerd[752]: time="2025-07-19T18:12:59.760554606Z" level=info msg="ignoring event" container=cd6e41a2cc82aae25d2d65801984943311b3f025c98ca865ea79be95194abc95 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" 2025 Jul 19 18:12:59.784436 vlab-01 INFO containerd[684]: time="2025-07-19T18:12:59.783563921Z" level=warning msg="cleanup warnings time=\"2025-07-19T18:12:59Z\" level=info msg=\"starting signal loop\" namespace=moby pid=42053 runtime=io.containerd.runc.v2\n" 2025 Jul 19 18:12:59.826676 vlab-01 INFO systemd[1]: var-lib-docker-overlay2-472b96da162023c3bc1e0d4132486ad7c122b23acf07f93d0e5b0a9538d7cebe-merged.mount: Deactivated successfully. 2025 Jul 19 18:12:59.840815 vlab-01 INFO container: docker cmd: wait for teamd 2025 Jul 19 18:12:59.843934 vlab-01 INFO container: docker cmd: stop for teamd 2025 Jul 19 18:12:59.861044 vlab-01 DEBUG container: container_stop: END 2025 Jul 19 18:12:59.906677 vlab-01 NOTICE admin: Stopped teamd service... 2025 Jul 19 18:12:59.938168 vlab-01 INFO systemd[1]: teamd.service: Deactivated successfully. 2025 Jul 19 18:12:59.938548 vlab-01 INFO systemd[1]: Stopped teamd.service - TEAMD container. 2025 Jul 19 18:12:59.939901 vlab-01 NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-host:event-stopped-ctr":{"ctr_name":"TEAMD","timestamp":"2025-07-19T18:12:59.939561Z"}} 2025 Jul 19 18:13:00.196745 vlab-01 INFO dockerd[752]: time="2025-07-19T18:13:00.196382391Z" level=info msg="Container failed to exit within 10s of signal 15 - using the force" container=2188a8952aa8d602224b78e325295831aceb8f14d1b6ec8869cc153b7eafef6a 2025 Jul 19 18:13:00.240397 vlab-01 ERR lldp#supervisor-proc-exit-listener: Exception: 'len', trace: Traceback (most recent call last):#12 File "/usr/bin/supervisor-proc-exit-listener", line 249, in <module>#12 main(sys.argv[1:])#12 File "/usr/bin/supervisor-proc-exit-listener", line 182, in main#012 payload = sys.stdin.read(int(headers['len']))#12 ~~~~~~~^^^^^^^#012KeyError: 'len' During shutdown, supervisor is sending termination events to the event listener, but the shutdown process is interrupting the event stream. The container is being forcibly killed (Container failed to exit within 10s of signal 15 - using the force), which can interrupt the supervisor event protocol mid-stream. Supervisor starts sending an event header Before it can complete sending the full header (including len: field), the process gets interrupted The listener receives a partial/malformed header without the len field Work item tracking Microsoft ADO 33409727: How I did it Check if 'len' exists before using it, if there is no len, it can't process the further steps.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why I did it
Work item tracking
How I did it
How to verify it
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)