Skip to content

[Nvidia] Remove unsupported targets and get a compiled mlnx build#15

Closed
vivekrnv wants to merge 47 commits intosaiarcot895:bookwormfrom
vivekrnv:bookworm
Closed

[Nvidia] Remove unsupported targets and get a compiled mlnx build#15
vivekrnv wants to merge 47 commits intosaiarcot895:bookwormfrom
vivekrnv:bookworm

Conversation

@vivekrnv
Copy link
Copy Markdown

Why I did it

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

Changes from Bullseye slave container:
* Python 2 is no longer available at all
* Python 3.11 (instead of Python 3.9)
* GCC 12 (instead of GCC 10)
* Python ipaddr package is no longer available
* OpenJDK 17 (instead of OpenJDK 11)
* Remove doxygen armhf manual compilation (no longer needed)
* Disable FIPS, as the FIPS binaries are currently not yet available
* Install Python setuptools through Debian instead of pip
* Install Python wheel through Debian instead of pip
* Install Python nose through Debian instead of pip
* Install Python j2cli through Debian instead of pip
* Install Python pexpect through Debian instead of pip
* Install Python parameterized through Debian instead of pip
* Install Python pyyaml through Debian instead of pip
* Install Python pyfakefs through Debian instead of pip
* Install Python m2crypto through Debian instead of pip
* Python pympler 1.0 (instead of 0.8)
* Install Python build (as a replacement to setup.py)

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
In Bookworm's version of setuptools, direct calls to setup.py are
deprecated and no longer guaranteed to work. One of the recommended
commands is to use the `build` python package to build packages, and
call it with `python -m build`. This, by default, builds the packages in
a virtualenv to ensure that only the specified dependencies in setup.py
are needed to build the package. This also extends to running tests,
where directly calling `setup.py test` may not work, and the recommended
alternatives are to either call `pytest` directly, or call `tox` or
`nox.` More details are available at [1].

For SONiC's use case, for building python packages, we cannot build all
Python packages in a virtualenv since there are dependencies that we
would have built earlier, and these packages are not pushed to pypi or
any package registry. (There may be a cleaner approach to this, though,
but I'm not aware of it.) For this reason, the `-n` flag is added to not
build the package in a virtualenv.

For testing, `pytest` is now called instead of `setup.py test`.

[1] https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Update tabulate to 0.9.0 and ijson to 3.2.3

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Newer versions of pip/setuptools don't support test_requires, and the
current standard is to specify any extra dependencies (such as those
required for testing) under extra_requires.

Therefore, specify the testing dependencies under extra_requires. These
can be installed via pip using `pip install '.[testing]'`.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
This tells the build infra that they need to be built as part of
Bullseye and not Bookworm.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
This ordering dependency causes FRR to get built for Bookworm, which we
don't need currently. Skip this by having it apply only to Bookworm.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
The help text printed for sonic-yang-mgmt has slight differences
depending on the package versions. Loosen this check to only check the
options themselves, rather than the surrounding text.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
With the new test command, test_cfggen_from_yang.py is now being run,
whereas previously, it was never run. This results in new failures
appearing from changes that have occurred some time back.

Therefore, for now, disable tests for sonic-config-engine when building
for Bookworm.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
There are odd failures in TestAclLoader and TestMuxcable. Skip running
tests for now.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
…Debian

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
This fixes 4 issues:
* Update tabulate to 0.9.0 and deepdiff to 6.2.2
* Specify test dependencies under extra_requires
* Add check_output parameter to the setup function due to the patch
* Fix error about having a mutable default for field headers in
  dataclass

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
This fixes 1 issue:
* Specify test dependencies under extra_requires

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Notable changes:
* Use j2cli from Debian repos instead of pip
* Use setuptools from Debian repos instead of pip
* Use wheel from Debian repos instead of pip
* Update grpcio and grpcio-tools python packages to match version in
  Bookworm
* Use m2crypto from Debian repos instead of pip

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Starting with Bookworm, Debian moved the non-free Linux firmware blobs
into a new non-free-firmware component, since they are frequently needed
by users and since they need to be updated frequently. Since the only
thing we currently install from the non-free component (that I can think
of) is the Linux firmware, have Bookworm use non-free-firmware instead
of non-free.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
saiarcot895 and others added 16 commits September 14, 2023 12:01
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Rasdaemon is not installed on armhf or arm64

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
@vivekrnv vivekrnv closed this Sep 20, 2023
saiarcot895 pushed a commit that referenced this pull request Nov 20, 2023
SAI 9.x requires a SYNCD_SHM_SIZE specified otherwise it will default to 64mb which is insufficient for syncd.

E.G. of a few failures seen when insufficient shmem was set

ha_init:  The file: warmboot_data_0 is of size=762[MB] and is beyond the directory: /dev/shm available storage of size=64[MB]#15
syncd.sh[26074]: Cannot get SYNCD_SHM_SIZE for chip: [869] in /usr/share/sonic/device/x86_64-broadcom_common/syncd_shm.ini. Skip set SYNCD_SHM_SIZE.

Syncd hangs here:

syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_shr_ha_section_resize:536 start=0x7f6e641b4000, end=0x7f6e645b4000, len=302276608, free=0x7f6e641b4000
Broadcom recommended using 1gb for DNX devices.

Since currently we don't use SAI9.x on master and 202305 this change won't fix anything until we upgrade the SAI on those branches.
saiarcot895 pushed a commit that referenced this pull request Jul 30, 2024
…-net#19637)

Broadcom requires that programmability_ucode_relative_path is set in SAI11.
This soc property replaces the legacy custom_feature_ucode_path

Without this we get the following error:

syncd#supervisord: syncd 0:dbx_file_get_db_location: DB Resource not defined#015
syncd#supervisord: syncd #15#015
syncd#supervisord: syncd 0:dnx_init_pemla_get_ucode_filepath:  Error 'Invalid parameter' indicated ; #15#015
saiarcot895 pushed a commit that referenced this pull request Aug 12, 2024
…-net#19637)

Broadcom requires that programmability_ucode_relative_path is set in SAI11.
This soc property replaces the legacy custom_feature_ucode_path

Without this we get the following error:

syncd#supervisord: syncd 0:dbx_file_get_db_location: DB Resource not defined#015
syncd#supervisord: syncd #15#015
syncd#supervisord: syncd 0:dnx_init_pemla_get_ucode_filepath:  Error 'Invalid parameter' indicated ; #15#015
saiarcot895 pushed a commit that referenced this pull request Nov 5, 2024
…7250E platform (sonic-net#20367)

Update sonic-platform submodule for Nokia-IXR7250E:
Fixes Nokia-ION/ndk#57

cdfbbe2 [H4-32D]Update platform modules after OC tests (Update README.md #17)
f28eff0 [H4-64D]Fix SFP+ port, eeprom, reboot-cause, thermal algorithm, add PSU input voltage check (Fix rules in Makefiles #15)
178e15a Minor watchdog change for better retention of last kick stamp
c479392 Remove rogue platform_reboot file
331abe0 Enhance watchdog script to detect fsde device hung signature
4c6b7c1 Fixed update temperature issue
5002fb7 Remove average and maximum
c620130 No PSU Master status led in IMM. No need to set it

Signed-off-by: mlok <marty.lok@nokia.com>
saiarcot895 pushed a commit that referenced this pull request Nov 27, 2024
…ly (sonic-net#20847)

#### Why I did it
src/sonic-bmp
```
* bfbd47b - (HEAD -> master, origin/master, origin/HEAD) Merge pull request #15 from FengPan-Frank/makefile (13 hours ago) [Feng-msft]
* ad31f5b - Create makefile for build image flow (17 hours ago) [Feng Pan]
```
#### How I did it
#### How to verify it
#### Description for the changelog
saiarcot895 pushed a commit that referenced this pull request Mar 5, 2026
…net#25643)

* [build] Add build timing report and dependency analysis tools

Add three scripts for build performance instrumentation:

- scripts/build-timing-report.sh: Parse per-package timing from build
  logs (HEADER/FOOTER timestamps), generate sorted duration table,
  phase breakdown, parallelism timeline, and CSV export.

- scripts/build-dep-graph.py: Parse rules/*.mk dependency graph,
  compute critical path, fan-out/fan-in bottleneck analysis, and
  generate DOT/JSON output for visualization.

- scripts/build-resource-monitor.sh: Sample CPU, memory, disk I/O,
  and Docker container count during builds for resource utilization
  analysis.

Add "make build-report" target to slave.mk that runs the timing
report and dependency analysis after a build completes.

Example output from a VS build on 24-core/30GB machine:
- 210 packages built in 53m wall time (173m CPU)
- Max concurrency: 5 (with SONIC_CONFIG_BUILD_JOBS=4)
- Critical path: 14 packages deep (libnl -> libswsscommon -> utilities)
- Top bottleneck: LIBSWSSCOMMON with 48 downstream dependents

Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>

* Address Copilot review: fix 17 bugs in build analysis scripts

- Use free -m with division instead of free -g to avoid rounding (#1)
- Add = and ?= to Makefile dependency regex patterns (#2, #7)
- CPU calculation now uses /proc/stat delta (two reads) (#3, #14)
- Fix misleading 'critical path estimate' comment (#4)
- Fix parallelism timeline comment (60s not 10s) (#5)
- Include after-relationship packages in fan stats (#6)
- Guard disk I/O division by zero when INTERVAL<=1 (#8)
- Remove unused elapsed_line variable (#9)
- Remove redundant LIBSWSSCOMMON_DBG check (#10)
- Remove active_make_jobs from CSV header comment (#11)
- Wire up _RDEPENDS parsing to build reverse deps (#12)
- Remove unnecessary 'if v' filter on rdeps JSON (#13)
- Remove unused REPORT_FORMAT parameter (#15)
- Add cycle detection to critical path algorithm (#16)
- Add execute permission check for companion scripts (#17)

Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>

---------

Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>
Co-authored-by: Rustiqly <rustiqly@users.noreply.github.com>
saiarcot895 pushed a commit that referenced this pull request Mar 26, 2026
…dating udevd rules (sonic-net#26343)

- Why I did it
On SONiC SmartSwitch platforms with DPUs, systemd-udevd crashes with SIGABRT on every reboot when DPU firmware initialization is slow. During the initramfs boot phase, a standalone systemd-udevd daemon is started to handle device discovery. If DPU firmware takes longer than the 60-second udevadm settle timeout (BlueField-3 DPUs can take 120 seconds each in the failure case when they are stuck), the initramfs cannot stop this udevd before switch_root. The stale process survives into the real system but is never chrooted into the overlayfs root, leaving it with a broken filesystem view. When dpu-udev-manager.sh writes udev rules, the stale udevd detects the change and crashes on an assertion in systemd's chase() path resolution (assert(path_is_absolute(p)) at chase.c:648), because dir_fd_is_root() returns false for a process whose root still points to the initramfs rootfs rather than the overlayfs.

This triggers a systemd issue : systemd/systemd#29559 which maintainers doesn't consider as a bug from systemd side. Raising this fix for our usecase.

Core was generated by `/usr/lib/systemd/systemd-udevd --daemon --resolve-names=never'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f29fe7a1cc2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f29fe78a4ac in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f29fea50c11 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#4  0x00007f29feb94a8b in chase () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#5  0x00007f29feb956e2 in chase_and_opendir () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#6  0x00007f29feb9a609 in conf_files_list_strv () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#7  0x00007f29fea913e8 in config_get_stats_by_path () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#8  0x0000559f295519cf in ?? ()
#9  0x0000559f29553a77 in ?? ()
#10 0x00007f29fec36055 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#11 0x00007f29fec3668d in sd_event_dispatch () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#12 0x00007f29fec394a8 in sd_event_run () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#13 0x00007f29fec396c7 in sd_event_loop () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#14 0x0000559f29545820 in ?? ()
#15 0x00007f29fe78bca8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x00007f29fe78bd65 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x0000559f29545c51 in ?? ()

- How I did it
Added a kill_stale_udevd() function to dpu-udev-manager.sh that runs before writing the udev rules. It identifies the systemd-managed udevd PID via systemctl show, then kills any other systemd-udevd --daemon process that doesn't match -- these are leftover initramfs instances. If no stale process exists (e.g. DPUs are healthy and the initramfs udevd exited cleanly), the function is a no-op.

- How to verify it
Deploy the image on a SmartSwitch with DPUs in a state where firmware initialization times out (>60s per DPU) by stopping image installation before firmware install step
Reboot the switch
Verify no new systemd-udevd coredumps in /var/core/
Verify the stale process was killed: journalctl -b 0 | grep dpu-udev-manager should show killing stale initramfs udevd PID (systemd udevd is PID )
Verify systemd-udevd.service is healthy: systemctl status systemd-udevd should show active (running)
Verify DPU udev rules were written: cat /etc/udev/rules.d/92-midplane-intf.rules should contain the DPU interface naming rules

Signed-off-by: Hemanth Kumar Tirupati <tirupatihemanthkumar@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants