Skip to content

[platform/mellanox] integrate the sdk4.3.2104/sai 1.14.3#12

Closed
stephenxs wants to merge 3 commits intomasterfrom
master-integrate-sai-1.4
Closed

[platform/mellanox] integrate the sdk4.3.2104/sai 1.14.3#12
stephenxs wants to merge 3 commits intomasterfrom
master-integrate-sai-1.4

Conversation

@stephenxs
Copy link
Owner

  1. platform/mellanox/sdk-src/sx-acl-helper/Makefile to be uploaded by sdk team as a part of sdk
  2. a typo in SAI-Implementation/debian/changelog which requires to be corrected manually

- What I did

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Stephen Sun added 3 commits September 18, 2019 11:33
1. platform/mellanox/sdk-src/sx-acl-helper/Makefile to be uploaded by sdk team as a part of sdk
2. a typo in SAI-Implementation/debian/changelog which requires to be corrected manually
@stephenxs stephenxs closed this Sep 18, 2019
@stephenxs stephenxs deleted the master-integrate-sai-1.4 branch September 21, 2019 05:04
stephenxs pushed a commit that referenced this pull request Jul 24, 2020
* src/sonic-telemetry fa8d498...3bd7ca3 (4):
  > Update gnmi deps (#40)
  > [testdata] Update SFP keys to align with new standard (#39)
  > Fixed the parameters for subscribe APIs (#38)
  > Azure ro mode (#34)

* src/sonic-mgmt-common 444aa9a...cc01ce4 (4):
  > Make gnmi dep version the same as in telemetry repo (#17)
  > Cleanup translib and cvl go test cases (#13)
  > Package update and enhancements/fixes in YGOT, and Request Binder (#12)
  > Translib phase I changes (#11)

Note: sonic-telemetry submodule update is dependent upon sonic-mgmt-common submodule update, thus updating both in this patch
stephenxs pushed a commit that referenced this pull request Nov 16, 2021
Updated the hw-mgmt pointer to include some bugfixes related to power supply voltages.
stephenxs pushed a commit that referenced this pull request Feb 2, 2024
…tically (sonic-net#17847)

#### Why I did it
src/sonic-dash-api
```
* 8f481de - (HEAD -> master, origin/master, origin/HEAD) [misc]: Add utils CLI (#12) (24 hours ago) [Ze Gan]
```
#### How I did it
#### How to verify it
#### Description for the changelog
stephenxs pushed a commit that referenced this pull request Dec 19, 2024
#### Why I did it

To fix errors that happen when writing to the queue:

```
Jun  5 23:04:41.798613 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun  5 23:04:41.798985 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun  5 23:04:41.799535 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun  5 23:04:41.806010 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun  5 23:04:41.814075 r-leopard-56 ERR healthd: system_service[Errno 104] Connection reset by peer
Jun  5 23:04:41.824135 r-leopard-56 ERR healthd: Traceback (most recent call last):#12  File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 484, in system_service#012    msg = self.myQ.get(timeout=QUEUE_TIMEOUT)#12  File "<string>", line 2, in get#012  File "/usr/lib/python3.9/multiprocessing/managers.py", line 809, in _callmethod#012    kind, result = conn.recv()#12  File "/usr/lib/python3.9/multiprocessing/connection.py", line 255, in recv#012    buf = self._recv_bytes()#12  File "/usr/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes#012    buf = self._recv(4)#12  File "/usr/lib/python3.9/multiprocessing/connection.py", line 384, in _recv#012    chunk = read(handle, remaining)#012ConnectionResetError: [Errno 104] Connection reset by peer
Jun  5 23:04:41.826489 r-leopard-56 INFO healthd[8494]: ERROR:dbus.connection:Exception in handler for D-Bus signal:
Jun  5 23:04:41.826591 r-leopard-56 INFO healthd[8494]: Traceback (most recent call last):
Jun  5 23:04:41.826640 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3/dist-packages/dbus/connection.py", line 232, in maybe_handle_message
Jun  5 23:04:41.826686 r-leopard-56 INFO healthd[8494]:     self._handler(*args, **kwargs)
Jun  5 23:04:41.826738 r-leopard-56 INFO healthd[8494]:   File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 82, in on_job_removed
Jun  5 23:04:41.826785 r-leopard-56 INFO healthd[8494]:     self.task_notify(msg)
Jun  5 23:04:41.826831 r-leopard-56 INFO healthd[8494]:   File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 110, in task_notify
Jun  5 23:04:41.826877 r-leopard-56 INFO healthd[8494]:     self.task_queue.put(msg)
Jun  5 23:04:41.826923 r-leopard-56 INFO healthd[8494]:   File "<string>", line 2, in put
Jun  5 23:04:41.826973 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3.9/multiprocessing/managers.py", line 808, in _callmethod
Jun  5 23:04:41.827018 r-leopard-56 INFO healthd[8494]:     conn.send((self._id, methodname, args, kwds))
Jun  5 23:04:41.827065 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3.9/multiprocessing/connection.py", line 211, in send
Jun  5 23:04:41.827115 r-leopard-56 INFO healthd[8494]:     self._send_bytes(_ForkingPickler.dumps(obj))
Jun  5 23:04:41.827158 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
Jun  5 23:04:41.827199 r-leopard-56 INFO healthd[8494]:     self._send(header + buf)
Jun  5 23:04:41.827254 r-leopard-56 INFO healthd[8494]:   File "/usr/lib/python3.9/multiprocessing/connection.py", line 373, in _send
Jun  5 23:04:41.827322 r-leopard-56 INFO healthd[8494]:     n = write(self._handle, buf)
Jun  5 23:04:41.827368 r-leopard-56 INFO healthd[8494]: BrokenPipeError: [Errno 32] Broken pipe
Jun  5 23:04:42.800216 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
```

When the multiprocessing.Manager is shutdown the queue will raise the above errors. This happens during shutdown - fast-reboot, warm-reboot.


With the fix, system-health service does not hang:

```
root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped"
Thu Oct 17 01:07:56 PM IDT 2024: Stopping...
Thu Oct 17 01:07:58 PM IDT 2024: Stopped
root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped"
Thu Oct 17 01:08:13 PM IDT 2024: Stopping...
Thu Oct 17 01:08:14 PM IDT 2024: Stopped
root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped"
Thu Oct 17 01:09:05 PM IDT 2024: Stopping...
Thu Oct 17 01:09:06 PM IDT 2024: Stopped
```

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it

Remove the call to shutdown, the cleanup will happen automatically when GC runs as per documentation - https://docs.python.org/3/library/multiprocessing.html

#### How to verify it

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

Run warm-reboot, fast-reboot multiple times and verify no errors in the log.

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [x] 202205
- [x] 202311
- [x] 202405

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
stephenxs pushed a commit that referenced this pull request Dec 23, 2024
…et#21095)

Adding the below fix from FRR FRRouting/frr#17297

This is to fix the following crash which is a statistical issue

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))]
(gdb) bt
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678
#4  0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352
#5  0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258
#6  route_next (node=<optimized out>) at ../lib/table.c:436
#7  route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410
#8  0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020")
    at ../zebra/interface.c:312
#9  0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867
#10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221
#11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810
#12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990
#13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198
#14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478
stephenxs pushed a commit that referenced this pull request Feb 19, 2025
…et#21405)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "fixes #xxxx", or
 "closes #xxxx" or "resolves #xxxx"

 Please provide the following information:
-->

#### Why I did it

Adding the below fix from FRR FRRouting/frr#17297

This is to fix the following crash which is a statistical issue

```
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))]
(gdb) bt
#0 0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678
#4 0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352
#5 0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258
#6 route_next (node=<optimized out>) at ../lib/table.c:436
#7 route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410
#8 0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020")
 at ../zebra/interface.c:312
#9 0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867
#10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221
#11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810
#12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990
#13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198
#14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478
```

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it
Added patch.

#### How to verify it
Running BGP tests.

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
- [ ] 202211
- [ ] 202305

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
stephenxs pushed a commit that referenced this pull request Feb 19, 2025
<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "fixes #xxxx", or
 "closes #xxxx" or "resolves #xxxx"

 Please provide the following information:
-->

#### Why I did it

During smartswitch initialization, an error is observed during switch bootup. ztp disable runs decode-eeprom.
```
sonic ERR decode-syseeprom: Failed to obtain EEPROM object due to ValueError("invalid literal for int() with base 10: ''"),
Traceback: Traceback (most recent call last):
#12 File "/usr/local/bin/decode-syseeprom", line 35, in instantiate_eeprom_object#012 eeprom = sonic_platform.platform.Platform().get_chassis().get_eeprom()
#12 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#12 File "/usr/local/lib/python3.11/dist-packages/sonic_platform/platform.py", line 35, in __init__
#12 self._chassis = SmartSwitchChassis()#12 ^^^^^^^^^^^^^^^^^^^^
#12 File "/usr/local/lib/python3.11/dist-packages/sonic_platform/chassis.py", line 1207, in __init__
#12 self.initialize_modules()#12 File "/usr/local/lib/python3.11/dist-packages/sonic_platform/chassis.py", line 1244, in initialize_modules
#12 self.initialize_single_module(index=index)
#12 File "/usr/local/lib/python3.11/dist-packages/sonic_platform/chassis.py", line 1235, in initialize_single_module
#12 from .module import DpuModule#012 File "/usr/local/lib/python3.11/dist-packages/sonic_platform/module.py", line 24, in <module>
#12 from .dpuctlplat import DpuCtlPlat, BootProgEnum
#12 File "/usr/local/lib/python3.11/dist-packages/sonic_platform/dpuctlplat.py", line 29, in <module>
#12 from .inotify_helper import InotifyHelper
#12 File "/usr/local/lib/python3.11/dist-packages/sonic_platform/inotify_helper.py", line 21, in <module>
#12 import inotify.adapters#012 File "/usr/local/lib/python3.11/dist-packages/inotify/adapters.py", line 37, in <module>
#12 _IS_DEBUG = bool(int(os.environ.get('DEBUG', '0')))
#12 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#012ValueError: invalid literal for int() with base 10: ''
```
Happens during ztp because, ztp sets DEBUG="" here https://github.com/sonic-net/sonic-ztp/blob/202411/src/etc/default/ztp#L6

#### How I did it

Fixed the import in inotify

#### How to verify it

Verified by running decode-eeprom during init

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
- [ ] 202211
- [ ] 202305

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
stephenxs pushed a commit that referenced this pull request Aug 8, 2025
…tener (sonic-net#23419)

Why I did it
It found the following KeyError in syslog, not only for lldp, but also for snmp and bgp.

2025 Jul 19 18:13:00.240397 vlab-01 ERR lldp#supervisor-proc-exit-listener: Exception: 'len', trace: Traceback (most recent call last):
  File "/usr/bin/supervisor-proc-exit-listener", line 249, in <module>
    main(sys.argv[1:])
  File "/usr/bin/supervisor-proc-exit-listener", line 182, in main
    payload = sys.stdin.read(int(headers['len']))
KeyError: 'len'
The context syslog is:

2025 Jul 19 18:12:59.505711 vlab-01 INFO lldp#supervisord 2025-07-19 18:12:59,504 INFO waiting for supervisor-proc-exit-listener, rsyslogd, lldpd, lldp-syncd, lldpmgrd to die
2025 Jul 19 18:12:59.761223 vlab-01 INFO containerd[684]: time="2025-07-19T18:12:59.759992163Z" level=info msg="shim disconnected" id=cd6e41a2cc82aae25d2d65801984943311b3f025c98ca865ea79be95194abc95
2025 Jul 19 18:12:59.762463 vlab-01 INFO containerd[684]: time="2025-07-19T18:12:59.760103279Z" level=warning msg="cleaning up after shim disconnected" id=cd6e41a2cc82aae25d2d65801984943311b3f025c98ca865ea79be95194abc95 namespace=moby
2025 Jul 19 18:12:59.765745 vlab-01 INFO containerd[684]: time="2025-07-19T18:12:59.760116062Z" level=info msg="cleaning up dead shim"
2025 Jul 19 18:12:59.767134 vlab-01 INFO dockerd[752]: time="2025-07-19T18:12:59.760554606Z" level=info msg="ignoring event" container=cd6e41a2cc82aae25d2d65801984943311b3f025c98ca865ea79be95194abc95 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
2025 Jul 19 18:12:59.784436 vlab-01 INFO containerd[684]: time="2025-07-19T18:12:59.783563921Z" level=warning msg="cleanup warnings time=\"2025-07-19T18:12:59Z\" level=info msg=\"starting signal loop\" namespace=moby pid=42053 runtime=io.containerd.runc.v2\n"
2025 Jul 19 18:12:59.826676 vlab-01 INFO systemd[1]: var-lib-docker-overlay2-472b96da162023c3bc1e0d4132486ad7c122b23acf07f93d0e5b0a9538d7cebe-merged.mount: Deactivated successfully.
2025 Jul 19 18:12:59.840815 vlab-01 INFO container: docker cmd: wait for teamd
2025 Jul 19 18:12:59.843934 vlab-01 INFO container: docker cmd: stop for teamd
2025 Jul 19 18:12:59.861044 vlab-01 DEBUG container: container_stop: END
2025 Jul 19 18:12:59.906677 vlab-01 NOTICE admin: Stopped teamd service...
2025 Jul 19 18:12:59.938168 vlab-01 INFO systemd[1]: teamd.service: Deactivated successfully.
2025 Jul 19 18:12:59.938548 vlab-01 INFO systemd[1]: Stopped teamd.service - TEAMD container.
2025 Jul 19 18:12:59.939901 vlab-01 NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-host:event-stopped-ctr":{"ctr_name":"TEAMD","timestamp":"2025-07-19T18:12:59.939561Z"}}
2025 Jul 19 18:13:00.196745 vlab-01 INFO dockerd[752]: time="2025-07-19T18:13:00.196382391Z" level=info msg="Container failed to exit within 10s of signal 15 - using the force" container=2188a8952aa8d602224b78e325295831aceb8f14d1b6ec8869cc153b7eafef6a
2025 Jul 19 18:13:00.240397 vlab-01 ERR lldp#supervisor-proc-exit-listener: Exception: 'len', trace: Traceback (most recent call last):#12  File "/usr/bin/supervisor-proc-exit-listener", line 249, in <module>#12    main(sys.argv[1:])#12  File "/usr/bin/supervisor-proc-exit-listener", line 182, in main#012    payload = sys.stdin.read(int(headers['len']))#12                                 ~~~~~~~^^^^^^^#012KeyError: 'len'
During shutdown, supervisor is sending termination events to the event listener, but the shutdown process is interrupting the event stream.
The container is being forcibly killed (Container failed to exit within 10s of signal 15 - using the force), which can interrupt the supervisor event protocol mid-stream.

Supervisor starts sending an event header
Before it can complete sending the full header (including len: field), the process gets interrupted
The listener receives a partial/malformed header without the len field

Work item tracking
Microsoft ADO 33409727:
How I did it
Check if 'len' exists before using it, if there is no len, it can't process the further steps.
stephenxs pushed a commit that referenced this pull request Oct 20, 2025
…atically (sonic-net#23557)

#### Why I did it
src/sonic-utilities
```
* e59bbfc - (HEAD -> master, origin/master, origin/HEAD) Fixing state_db not having delete_field attribute causing a crash when DPUs in bad state (sonic-net#4064) (9 hours ago) [rameshraghupathy]
* 9386963 - Improve set/get cmdline to support uefi (sonic-net#4062) (3 days ago) [Hua Liu]
* 3a7d0b4 - [dhcp_relay] Update show cli sample for dhcp_relay (sonic-net#4070) (4 days ago) [Balakrishna-goshika]
* 89c9aef - FEC histogram with ability to clear stat (sonic-net#4075) (7 days ago) [Prince George]
* b247e93 - Skip speed validation for chassis. (sonic-net#4076) (10 days ago) [Xincun Li]
* 70926dd - [FRR]Adding additional FRR dumps (sonic-net#4073) (11 days ago) [Sudharsan Dhamal Gopalarathnam]
* d7c16c3 - Fix incorrect output format for pre-fec ber in sfpshow pm (sonic-net#4066) (2 weeks ago) [Changrong Wu]
* 80a20e7 - [doc][dhcp_server] Update cli doc for dhcp_server sonic-net#4069 (3 weeks ago) [Balakrishna-goshika]
* c1843fa - Issue sonic-net#22759: Prevent CLI from adding invalid routed interfaces (sonic-net#3901) (3 weeks ago) [Anders Linn]
* 7c5378e - Issue 23798: Wrap getpass.getpass in a signal handler to avoid SIGTTOU (sonic-net#4061) (3 weeks ago) [Anders Linn]
* 28dfb29 - Fix issue that dynamic/static threshold 0 can not be configured using mmuconfig (sonic-net#4049) (4 weeks ago) [Stephen Sun]
* e276765 - Support multi-asic in gcu.py (sonic-net#4057) (5 weeks ago) [ganglv]
* d2c697f - Add sonic-error-report tool for structured error reporting (sonic-net#4037) (5 weeks ago) [Dawei Huang]
* ed5afd8 - Add python wheels for GCU (sonic-net#4042) (5 weeks ago) [ganglv]
* 98e4916 - Add Arista-7060X6-64PE-B-O128S2, Arista-7060X6-16PE-384C-B-O128S2 to GCU (sonic-net#4055) (5 weeks ago) [rick-arista]
* 9e9a65b - Issue sonic-net#22420: Modify 'config route add' command not to include empty elements (#12) (sonic-net#3862) (5 weeks ago) [Anders Linn]
* 0edb592 - Mux cable show config command Added prober_type and fixed one format (sonic-net#4013) (5 weeks ago) [harjotsinghpawra]
* 2657ee3 - Fixed cli command for ECN config on voq switch to set the WRED_PROFILE for all Voqs (sonic-net#4029) (6 weeks ago) [saksarav-nokia]
* d1c9d1a - [show][config] Add CLI support for configurable drop monitor feature (sonic-net#3756) (6 weeks ago) [HP]
* 7baa75b - [spm] Rename entry tag variable to docker_image_reference (sonic-net#4019) (6 weeks ago) [DavidZagury]
* 63364a3 - Add BlockingMode for Reboot script (sonic-net#3958) (6 weeks ago) [Litao Yu]
* ee8113f - Support for platforms based on Clounix net device (sonic-net#3970) (6 weeks ago) [LongWuuu]
* f53a5c1 - [config show]BGP Suppress fib pending config and display for multi-asic (sonic-net#3948) (7 weeks ago) [vganesan-nokia]
* b3de0af - Add Arista 7800 platforms to GCU validator (sonic-net#4038) (7 weeks ago) [Xincun Li]
* 13a0cb2 - Add check_pfc_storm_active() to fast-reboot script (sonic-net#3969) (7 weeks ago) [Dawei Huang]
* f45d896 - [smartswitch] Update get_gnmi_port() based on smartswitch config updates (sonic-net#4041) (7 weeks ago) [Vasundhara Volam]
* ea33ef3 - [nvidia-bluefield] Add CLI for packet-drop and config-record (sonic-net#4002) (8 weeks ago) [Vivek]
* ffc891d - [dhcp_server] Add CLI sample for dhcp_server (sonic-net#4033) (8 weeks ago) [Yaqiang Zhu]
* 1e9d04c - Update doc to including dhcp_server ipv4 counter related CLI (sonic-net#4028) (9 weeks ago) [Yaqiang Zhu]
* 19594b9 - Fix show int transceiver EEPROM crash for for Backplane cartridge + enhance EEPROM CLI output (sonic-net#4020) (9 weeks ago) [mihirpat1]
* 0f8ac9b - Added MAX pre-fec_ber for FEC counter (sonic-net#4027) (9 weeks ago) [Prince George]
* 732dc09 - Added json support for show platform temperature (sonic-net#3874) (9 weeks ago) [Vinod Kumar]
* bacff45 - Add Arista 7800 platforms to GCU validator (sonic-net#4030) (9 weeks ago) [Xincun Li]
* c63e9ea - [trim]: Add Packet Trimming Drop Counters CLI (sonic-net#3993) (9 weeks ago) [Nazarii Hnydyn]
* 50df9ea - Adapt 'show muxcable tunnel-route' for prefix route based mux neighbors (sonic-net#4007) (9 weeks ago) [manamand2020]
* 868189c - Pr json support queue and priority-group watermark and persistent-watermark (sonic-net#3875) (9 weeks ago) [Vinod Kumar]
* 5347757 - Revert "[SPM] Rename the variable tag to docker-image-reference (sonic-net#3998)" (sonic-net#4024) (9 weeks ago) [Jianquan Ye]
* 1418f21 - Added json support intfutil (sonic-net#3906) (10 weeks ago) [Vinod Kumar]
* ec01962 - sfputil and sfpshow eeprom and DOM CLI enhancement to display data for all CMIS transceivers (sonic-net#4010) (10 weeks ago) [mihirpat1]
* c0838d7 - CLI for Configuring PFC Historical Statistics (sonic-net#3779) (2 months ago) [Peter Bailey]
* d623c25 - [Mellanox][Smartswitch]Added dpu status output to dump (sonic-net#3959) (2 months ago) [Gagan Punathil Ellath]
* a3101ea - Fix for sonic-net#23205 [Smartswitch] Issues caused due to introduction of the chassisd/sonic-utiltiies changes for consecutive admin state changes (sonic-net#3984) (2 months ago) [rameshraghupathy]
* d3bc688 - CLI addition for PFC counters --history (sonic-net#3778) (2 months ago) [Peter Bailey]
* 3282ab3 - DOM for flat memory transceiver modules (sonic-net#3950) (2 months ago) [Ariz Zubair]
* 6f1a794 - Add queuestat changes for aggregate VOQ counters (sonic-net#3617) (2 months ago) [Vivek Verma]
* d86b2b6 - g[sfputil debug] Fix issue: do not check output status when CMIS version is lower than 5.0 (sonic-net#3938) (2 months ago) [Junchao-Mellanox]
* 252a643 - [SPM] Rename the variable tag to docker-image-reference (sonic-net#3998) (2 months ago) [DavidZagury]
```
#### How I did it
#### How to verify it
#### Description for the changelog
stephenxs pushed a commit that referenced this pull request Nov 18, 2025
…atically (sonic-net#24272)

#### Why I did it
src/sonic-utilities
```
* 8d2bc08 - (HEAD -> master, origin/master, origin/HEAD) Add pfc_stat_history support (sonic-net#4102) (6 hours ago) [Xincun Li]
* 7a046d6 - [trim]: Fix GCU trimming eligibility modification (sonic-net#4087) (28 hours ago) [Nazarii Hnydyn]
* f4e5de3 - [GCU] Handle duplicate array entries and auto-create empty tables during patch application (sonic-net#4095) (2 days ago) [Xincun Li]
* a131061 - [fast/warm-reboot] Fix timers query (sonic-net#4022) (2 days ago) [Stepan Blyshchak]
* 3bf5c27 - [Mellanox] Update generate_dump to include SDK sysfs files (sonic-net#4071) (2 days ago) [Noa Or]
* d4eb8ec - [portstat] Add FEC FLR statistics support to port counters (sonic-net#4054) (3 days ago) [Apoorv Sachan]
* 55b665b - Secureboot: Image signing verification enhancements (sonic-net#3989) (7 days ago) [Brad House - NextHop]
* e59bbfc - Fixing state_db not having delete_field attribute causing a crash when DPUs in bad state (sonic-net#4064) (12 days ago) [rameshraghupathy]
* 9386963 - Improve set/get cmdline to support uefi (sonic-net#4062) (2 weeks ago) [Hua Liu]
* 3a7d0b4 - [dhcp_relay] Update show cli sample for dhcp_relay (sonic-net#4070) (2 weeks ago) [Balakrishna-goshika]
* 89c9aef - FEC histogram with ability to clear stat (sonic-net#4075) (3 weeks ago) [Prince George]
* b247e93 - Skip speed validation for chassis. (sonic-net#4076) (3 weeks ago) [Xincun Li]
* 70926dd - [FRR]Adding additional FRR dumps (sonic-net#4073) (3 weeks ago) [Sudharsan Dhamal Gopalarathnam]
* d7c16c3 - Fix incorrect output format for pre-fec ber in sfpshow pm (sonic-net#4066) (4 weeks ago) [Changrong Wu]
* 80a20e7 - [doc][dhcp_server] Update cli doc for dhcp_server sonic-net#4069 (4 weeks ago) [Balakrishna-goshika]
* c1843fa - Issue sonic-net#22759: Prevent CLI from adding invalid routed interfaces (sonic-net#3901) (4 weeks ago) [Anders Linn]
* 7c5378e - Issue 23798: Wrap getpass.getpass in a signal handler to avoid SIGTTOU (sonic-net#4061) (5 weeks ago) [Anders Linn]
* 28dfb29 - Fix issue that dynamic/static threshold 0 can not be configured using mmuconfig (sonic-net#4049) (5 weeks ago) [Stephen Sun]
* e276765 - Support multi-asic in gcu.py (sonic-net#4057) (6 weeks ago) [ganglv]
* d2c697f - Add sonic-error-report tool for structured error reporting (sonic-net#4037) (6 weeks ago) [Dawei Huang]
* ed5afd8 - Add python wheels for GCU (sonic-net#4042) (6 weeks ago) [ganglv]
* 98e4916 - Add Arista-7060X6-64PE-B-O128S2, Arista-7060X6-16PE-384C-B-O128S2 to GCU (sonic-net#4055) (6 weeks ago) [rick-arista]
* 9e9a65b - Issue sonic-net#22420: Modify 'config route add' command not to include empty elements (#12) (sonic-net#3862) (7 weeks ago) [Anders Linn]
* 0edb592 - Mux cable show config command Added prober_type and fixed one format (sonic-net#4013) (7 weeks ago) [harjotsinghpawra]
* 2657ee3 - Fixed cli command for ECN config on voq switch to set the WRED_PROFILE for all Voqs (sonic-net#4029) (7 weeks ago) [saksarav-nokia]
* d1c9d1a - [show][config] Add CLI support for configurable drop monitor feature (sonic-net#3756) (7 weeks ago) [HP]
* 7baa75b - [spm] Rename entry tag variable to docker_image_reference (sonic-net#4019) (8 weeks ago) [DavidZagury]
* 63364a3 - Add BlockingMode for Reboot script (sonic-net#3958) (8 weeks ago) [Litao Yu]
* ee8113f - Support for platforms based on Clounix net device (sonic-net#3970) (8 weeks ago) [LongWuuu]
* f53a5c1 - [config show]BGP Suppress fib pending config and display for multi-asic (sonic-net#3948) (8 weeks ago) [vganesan-nokia]
* b3de0af - Add Arista 7800 platforms to GCU validator (sonic-net#4038) (9 weeks ago) [Xincun Li]
* 13a0cb2 - Add check_pfc_storm_active() to fast-reboot script (sonic-net#3969) (9 weeks ago) [Dawei Huang]
* f45d896 - [smartswitch] Update get_gnmi_port() based on smartswitch config updates (sonic-net#4041) (9 weeks ago) [Vasundhara Volam]
* ea33ef3 - [nvidia-bluefield] Add CLI for packet-drop and config-record (sonic-net#4002) (9 weeks ago) [Vivek]
* ffc891d - [dhcp_server] Add CLI sample for dhcp_server (sonic-net#4033) (10 weeks ago) [Yaqiang Zhu]
* 1e9d04c - Update doc to including dhcp_server ipv4 counter related CLI (sonic-net#4028) (2 months ago) [Yaqiang Zhu]
* 19594b9 - Fix show int transceiver EEPROM crash for for Backplane cartridge + enhance EEPROM CLI output (sonic-net#4020) (2 months ago) [mihirpat1]
* 0f8ac9b - Added MAX pre-fec_ber for FEC counter (sonic-net#4027) (2 months ago) [Prince George]
* 732dc09 - Added json support for show platform temperature (sonic-net#3874) (2 months ago) [Vinod Kumar]
* bacff45 - Add Arista 7800 platforms to GCU validator (sonic-net#4030) (2 months ago) [Xincun Li]
* c63e9ea - [trim]: Add Packet Trimming Drop Counters CLI (sonic-net#3993) (3 months ago) [Nazarii Hnydyn]
* 50df9ea - Adapt 'show muxcable tunnel-route' for prefix route based mux neighbors (sonic-net#4007) (3 months ago) [manamand2020]
* 868189c - Pr json support queue and priority-group watermark and persistent-watermark (sonic-net#3875) (3 months ago) [Vinod Kumar]
* 5347757 - Revert "[SPM] Rename the variable tag to docker-image-reference (sonic-net#3998)" (sonic-net#4024) (3 months ago) [Jianquan Ye]
* 1418f21 - Added json support intfutil (sonic-net#3906) (3 months ago) [Vinod Kumar]
* ec01962 - sfputil and sfpshow eeprom and DOM CLI enhancement to display data for all CMIS transceivers (sonic-net#4010) (3 months ago) [mihirpat1]
* c0838d7 - CLI for Configuring PFC Historical Statistics (sonic-net#3779) (3 months ago) [Peter Bailey]
* d623c25 - [Mellanox][Smartswitch]Added dpu status output to dump (sonic-net#3959) (3 months ago) [Gagan Punathil Ellath]
* a3101ea - Fix for sonic-net#23205 [Smartswitch] Issues caused due to introduction of the chassisd/sonic-utiltiies changes for consecutive admin state changes (sonic-net#3984) (3 months ago) [rameshraghupathy]
* d3bc688 - CLI addition for PFC counters --history (sonic-net#3778) (3 months ago) [Peter Bailey]
* 3282ab3 - DOM for flat memory transceiver modules (sonic-net#3950) (3 months ago) [Ariz Zubair]
* 6f1a794 - Add queuestat changes for aggregate VOQ counters (sonic-net#3617) (3 months ago) [Vivek Verma]
* d86b2b6 - g[sfputil debug] Fix issue: do not check output status when CMIS version is lower than 5.0 (sonic-net#3938) (3 months ago) [Junchao-Mellanox]
* 252a643 - [SPM] Rename the variable tag to docker-image-reference (sonic-net#3998) (3 months ago) [DavidZagury]
```
#### How I did it
#### How to verify it
#### Description for the changelog
stephenxs pushed a commit that referenced this pull request Mar 9, 2026
…net#25643)

* [build] Add build timing report and dependency analysis tools

Add three scripts for build performance instrumentation:

- scripts/build-timing-report.sh: Parse per-package timing from build
  logs (HEADER/FOOTER timestamps), generate sorted duration table,
  phase breakdown, parallelism timeline, and CSV export.

- scripts/build-dep-graph.py: Parse rules/*.mk dependency graph,
  compute critical path, fan-out/fan-in bottleneck analysis, and
  generate DOT/JSON output for visualization.

- scripts/build-resource-monitor.sh: Sample CPU, memory, disk I/O,
  and Docker container count during builds for resource utilization
  analysis.

Add "make build-report" target to slave.mk that runs the timing
report and dependency analysis after a build completes.

Example output from a VS build on 24-core/30GB machine:
- 210 packages built in 53m wall time (173m CPU)
- Max concurrency: 5 (with SONIC_CONFIG_BUILD_JOBS=4)
- Critical path: 14 packages deep (libnl -> libswsscommon -> utilities)
- Top bottleneck: LIBSWSSCOMMON with 48 downstream dependents

Signed-off-by: Rustiqly <[email protected]>

* Address Copilot review: fix 17 bugs in build analysis scripts

- Use free -m with division instead of free -g to avoid rounding (#1)
- Add = and ?= to Makefile dependency regex patterns (#2, #7)
- CPU calculation now uses /proc/stat delta (two reads) (#3, #14)
- Fix misleading 'critical path estimate' comment (#4)
- Fix parallelism timeline comment (60s not 10s) (#5)
- Include after-relationship packages in fan stats (#6)
- Guard disk I/O division by zero when INTERVAL<=1 (#8)
- Remove unused elapsed_line variable (#9)
- Remove redundant LIBSWSSCOMMON_DBG check (#10)
- Remove active_make_jobs from CSV header comment (#11)
- Wire up _RDEPENDS parsing to build reverse deps (#12)
- Remove unnecessary 'if v' filter on rdeps JSON (#13)
- Remove unused REPORT_FORMAT parameter (#15)
- Add cycle detection to critical path algorithm (#16)
- Add execute permission check for companion scripts (#17)

Signed-off-by: Rustiqly <[email protected]>

---------

Signed-off-by: Rustiqly <[email protected]>
Co-authored-by: Rustiqly <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant