Skip to content

Fix zstd dockerfs build for ONIE and for non-amd64#1

Merged
lipxu merged 2 commits intolipxu:20250221_publicMaster_pzstdfrom
saiarcot895:fix-zstd-dockerfs
Feb 22, 2025
Merged

Fix zstd dockerfs build for ONIE and for non-amd64#1
lipxu merged 2 commits intolipxu:20250221_publicMaster_pzstdfrom
saiarcot895:fix-zstd-dockerfs

Conversation

@saiarcot895
Copy link

Why I did it

Work item tracking
  • Microsoft ADO (number only):

How I did it

For non-amd64, remove the explicit reference to the x86_64-linux-gnu folder. Instead, let copy_exec determine what libraries to copy into initramfs.

For ONIE, propagate the BUILD_REDUCE_IMAGE_SIZE environment variable into install.sh. By setting it there, the right value for FILESYSTEM_DOCKERFS will get set in onie-image.conf.

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@lipxu lipxu merged commit d015b8a into lipxu:20250221_publicMaster_pzstd Feb 22, 2025
1 check passed
lipxu pushed a commit that referenced this pull request Jul 10, 2025
…et#21405)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "fixes #xxxx", or
 "closes #xxxx" or "resolves #xxxx"

 Please provide the following information:
-->

#### Why I did it

Adding the below fix from FRR FRRouting/frr#17297

This is to fix the following crash which is a statistical issue

```
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))]
(gdb) bt
#0 0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678
sonic-net#4 0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352
sonic-net#5 0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258
sonic-net#6 route_next (node=<optimized out>) at ../lib/table.c:436
sonic-net#7 route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410
sonic-net#8 0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020")
 at ../zebra/interface.c:312
sonic-net#9 0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867
sonic-net#10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221
sonic-net#11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810
sonic-net#12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990
sonic-net#13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198
sonic-net#14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478
```

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it
Added patch.

#### How to verify it
Running BGP tests.

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
- [ ] 202211
- [ ] 202305

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
lipxu pushed a commit that referenced this pull request Jul 10, 2025
#### Why I did it

Dropping control character (message sent when XSUB connects to XPUB as part of ZMQ Proxy setup to notify that subscription has been made) in do capture has been flaky since control character is not guaranteed to be the first message sent if there are events (like event-down-ctr) being published to XSUB.

Scenarios

1) Control character is sent and is first message when starting capture service

`eventd#eventd#eventd: :- heartbeat_ctrl: Set heartbeat_ctrl pause=1`
`eventd#eventd#eventd: :- do_capture: Received subscription message when XSUB connects to XPUB`

2) Events like event-down ctr is sent before control character

`eventd#eventd#eventd: :- run: Dropping Message: 22 serialization::archive 18 17 sonic-events-host`
`eventd#eventd#eventd: :- run: Dropping Message: 22 serialization::archive 18 0 0 4 0 0 0 1 d 103 {"sonic-events-host:event-stopped-ctr":{"ctr_name":"EVENTD","timestamp":"2024-08-27T00:02:51.407518Z"}} 1 r 36 3357542f-bae1-458f-a804-660e620d21f5 1 s 1 9 1 t 19 1724716971407591080`
`heartbeat_ctrl: Set heartbeat_ctrl pause=1`
`do_capture: Received subscription message when XSUB connects to XPUB`

3) Control character is not sent at all

`eventd#eventd#eventd: :- heartbeat_ctrl: Set heartbeat_ctrl pause=1`

4) Control character is delayed and not caught when starting capture service, but is then caught after causing deserialize error.

`do_capture: Receiving event from source: 22 serialization::archive 18 17 sonic-events-host, will read second part of event`
`deserialize: deserialize Failed: input stream errorstr[0:64]:(#1) data type: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&`
`zmq_read_part: Failed to deserialize part rc=-2`
`zmq_read_part: last:errno=11`
`zmq_message_read: Failure to read part1 rc=-2`
`zmq_message_read: last:errno=11`

We can cover these scenarios by just dropping the control character inside zmq_message_read as part of events_common in swsscommon (different PR). In this PR we will remove such handling logic and make sure that empty events that will be sent by control character are ignored.

##### Work item tracking
- Microsoft ADO **(number only)**:28728116

#### How I did it

Remove logic for handling control character

#### How to verify it

UT and sonic-mgmt test cases.
lipxu pushed a commit that referenced this pull request Jul 10, 2025
To fix a statistical issue. The original fix was done in FRRouting/frr#17297. However to accommodate 8.5.4 the patch in the PR was added.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))]
(gdb) bt
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678
sonic-net#4  0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352
sonic-net#5  0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258
sonic-net#6  route_next (node=<optimized out>) at ../lib/table.c:436
sonic-net#7  route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410
sonic-net#8  0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020")
    at ../zebra/interface.c:312
sonic-net#9  0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867
sonic-net#10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221
sonic-net#11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810
sonic-net#12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990
sonic-net#13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198
sonic-net#14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478
lipxu pushed a commit that referenced this pull request Dec 22, 2025
#### Why I did it
If one python wheel is already installed inside slave container, it will not install again. Below is a sample log:
```
sed: -e expression #1, char 11: extra characters after command
WARNING: The directory '/var/user/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Processing ./target/python-wheels/bookworm/sonic_yang_models-1.0-py3-none-any.whl
sonic-yang-models is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 24.2 -> 25.3
[notice] To update, run: python3 -m pip install --upgrade pip
Build end time: Wed Dec 3 22:53:07 UTC 2025
Elapsed time: 0h 0m 1s
```
 However, we expect to reinstall the python wheel for target `$(PYTHON_WHEELS_PATH)/%-install`

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it
Update slave.mk to enasure force install the python wheel.

#### How to verify it
After this change, local build will successfully force install the python wheel. See new logs:
```
sed: -e expression #1, char 11: extra characters after command
WARNING: The directory '/var/qiluo/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Processing ./target/python-wheels/bookworm/sonic_yang_models-1.0-py3-none-any.whl
Installing collected packages: sonic-yang-models
  Attempting uninstall: sonic-yang-models
    Found existing installation: sonic-yang-models 1.0
    Uninstalling sonic-yang-models-1.0:
      Successfully uninstalled sonic-yang-models-1.0
Successfully installed sonic-yang-models-1.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

[notice] A new release of pip is available: 24.2 -> 25.3
[notice] To update, run: python3 -m pip install --upgrade pip
Build end time: Wed Dec 3 23:59:31 UTC 2025
```
lipxu pushed a commit that referenced this pull request Dec 22, 2025
…logs

The `imklog` plugin of rsyslog collects the kernel logs from `/dev/kmsg` and
enqueues it to the syslog. With `CONFIG_PRINTK_TIME` the kernel messages are by
default prefixed with the elapsed time since boot. The `imklog` plugin parsing
these messages have a few options such as to keep the timestamps as such or to
interpret and adjust the syslog's reported time accordingly.

The rsylog release `8.2312.0` has fixes in interpreting these timestamps,
leading to the change in behavior observed in sonic-net#24386.

  https://salsa.debian.org/debian/rsyslog/-/blob/debian/8.2504.0-1/ChangeLog?ref_type=tags#L619

To restore the earlier behavior or retaining the kernel reported elapsed time,
disable `KlogParseKernelTimestamp` as this leads to removal of timestamp from
kernel messages and enable `KlogKeepKernelTimestamp` explicitly. The later is
required as the default is now to discard the kernel timestamp.

With this change, the logs retain the kernel timestamp:

    root@sonic:~# cat /var/log/syslog | grep "sonic.*kernel:" | head -n 3
    2025 Nov  4 05:15:14.918946 sonic NOTICE kernel: [    0.000000] Linux version 6.12.41+deb13-sonic-amd64 ([email protected]) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC Debian 6.12.41-1 (2025-08-12)
    2025 Nov  4 05:15:14.919533 sonic INFO kernel: [    0.000000] Command line: BOOT_IMAGE=/image-trixie.0-dirty-20251102.122837/boot/vmlinuz-6.12.41+deb13-sonic-amd64 root=UUID=ac0b6826-f8a3-461f-a8ff-701df60d90b6 rw console=tty0 console=ttyS0,115200n8 quiet processor.max_cstate=1 intel_idle.max_cstate=0 net.ifnames=0 biosdevname=0 loop=image-trixie.0-dirty-20251102.122837/fs.squashfs loopfstype=squashfs apparmor=1 security=apparmor varlog_size=4096 usbcore.autosuspend=-1 intel_iommu=off modprobe.blacklist=gpio_ich,i2c-ismt,i2c_ismt,i2c-i801,i2c_i801 crashkernel=0M-2G:256M,2G-4G:320M,4G-8G:384M,8G-:448M acpi_no_watchdog
    2025 Nov  4 05:15:14.919536 sonic INFO kernel: [    0.000000] BIOS-provided physical RAM map:
    root@sonic:~# cat /var/log/syslog | grep "sonic.*kernel:" | tail -n 3
    2025 Nov  4 05:17:26.831607 sonic WARNING kernel: [  143.527486] PDDF_LED       set_status_led: Set [FANTRAY_LED;1] color[green]
    2025 Nov  4 05:17:26.912442 sonic WARNING kernel: [  143.607086] PDDF_LED       set_status_led: Set [FANTRAY_LED;2] color[green]
    2025 Nov  4 05:20:32.499634 sonic WARNING kernel: [  329.195319] PDDF_LED       set_status_led: Set [SYS_LED;0] color[amber]
    root@sonic:~#

Signed-off-by: Ramasamy Chandramouli <[email protected]>
Co-authored-by: Ramasamy Chandramouli <[email protected]>
lipxu pushed a commit that referenced this pull request Mar 6, 2026
…net#25643)

* [build] Add build timing report and dependency analysis tools

Add three scripts for build performance instrumentation:

- scripts/build-timing-report.sh: Parse per-package timing from build
  logs (HEADER/FOOTER timestamps), generate sorted duration table,
  phase breakdown, parallelism timeline, and CSV export.

- scripts/build-dep-graph.py: Parse rules/*.mk dependency graph,
  compute critical path, fan-out/fan-in bottleneck analysis, and
  generate DOT/JSON output for visualization.

- scripts/build-resource-monitor.sh: Sample CPU, memory, disk I/O,
  and Docker container count during builds for resource utilization
  analysis.

Add "make build-report" target to slave.mk that runs the timing
report and dependency analysis after a build completes.

Example output from a VS build on 24-core/30GB machine:
- 210 packages built in 53m wall time (173m CPU)
- Max concurrency: 5 (with SONIC_CONFIG_BUILD_JOBS=4)
- Critical path: 14 packages deep (libnl -> libswsscommon -> utilities)
- Top bottleneck: LIBSWSSCOMMON with 48 downstream dependents

Signed-off-by: Rustiqly <[email protected]>

* Address Copilot review: fix 17 bugs in build analysis scripts

- Use free -m with division instead of free -g to avoid rounding (#1)
- Add = and ?= to Makefile dependency regex patterns (#2, sonic-net#7)
- CPU calculation now uses /proc/stat delta (two reads) (#3, sonic-net#14)
- Fix misleading 'critical path estimate' comment (sonic-net#4)
- Fix parallelism timeline comment (60s not 10s) (sonic-net#5)
- Include after-relationship packages in fan stats (sonic-net#6)
- Guard disk I/O division by zero when INTERVAL<=1 (sonic-net#8)
- Remove unused elapsed_line variable (sonic-net#9)
- Remove redundant LIBSWSSCOMMON_DBG check (sonic-net#10)
- Remove active_make_jobs from CSV header comment (sonic-net#11)
- Wire up _RDEPENDS parsing to build reverse deps (sonic-net#12)
- Remove unnecessary 'if v' filter on rdeps JSON (sonic-net#13)
- Remove unused REPORT_FORMAT parameter (sonic-net#15)
- Add cycle detection to critical path algorithm (sonic-net#16)
- Add execute permission check for companion scripts (sonic-net#17)

Signed-off-by: Rustiqly <[email protected]>

---------

Signed-off-by: Rustiqly <[email protected]>
Co-authored-by: Rustiqly <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants