Skip to content

temp PR multi-asic-ready-signal --> multi-asic#71

Closed
noaOrMlnx wants to merge 233 commits intomulti-asicfrom
multi-asic-ready-signal-noaor
Closed

temp PR multi-asic-ready-signal --> multi-asic#71
noaOrMlnx wants to merge 233 commits intomulti-asicfrom
multi-asic-ready-signal-noaor

Conversation

@noaOrMlnx
Copy link
Owner

Why I did it

Work item tracking
  • Microsoft ADO (number only):

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

gpunathilell and others added 30 commits December 15, 2025 15:55
- Why I did it
This dhclient call was added so that the 202505 image which does not have cid set to mac and 202506 which has explicit set for CID to mac has compatibility (DHCP server will only provide lease to same CIDs, but this WA which was added uses dhclient instead of systemd-networkd - and has no CID in the request)
The fix itself is present so that dhclient request does not take too long, since we have direct connectivity between DPU and switch, the request should finish within 5 seconds, unblocking all other systemd services and other daemons

- How I did it
Add timeout of 5 seconds

- How to verify it
Verified after upgrade from 202505 to 202506 image to confirm that there are no issues:

Signed-off-by: gpunathilell <gpunathilell@nvidia.com>
…t#24634)

- Why I did it
Fix platform name in eeprom.py to support getting system eeprom on 5640 simx platform

- How I did it
Use correct platform name "x86_64-nvidia_sn5640_simx-r0"

- How to verify it
Manual test
…onic-net#24125)

- Why I did it
Following this HLD for liquid cooling sonic-net/SONiC#2032, I created this pr to add Mellanox platform specific APIs.

This pr has dependency on sonic-net/sonic-platform-common#603

- How I did it
Add support for liquid cool part of the platform API in case thermal

How to verify it
Add unit test to cover the new added code once liquid cool is enabled

Signed-off-by: Yuanzhe Liu <yualiu@nvidia.com>
…4626)

- Why I did it
An inconsistency was found between sai_5600.xml in the SN5600 Simx platform (x86_64-nvidia_sn5600_simx-r0) and the non-Simx variant (x86_64-nvidia_sn5600-r0).

- How I did it
To avoid maintaining duplicate files, I replaced the duplicate file with a symlink pointing to the non-SimX variant. This ensures both platforms use the same correct source of truth.
…acement (sonic-net#24688)

- Why I did it
CMIS module cached module serial in memory, it causes an issue that sfp.get_temperature_info cannot detect SFP replacement. The PR is a fix for the issue.

- How I did it
Implement a private sfp._get_serial which reads module serial from EEPROM instead of memory.

- How to verify it
Manual test passed
Unit test passed

Signed-off-by: Junchao-Mellanox <junchao@nvidia.com>
How I did it
Update YANG and add related unit tests.

How to verify it
Unit tests are all passed.

Signed-off-by: Jing Kan <jika@microsoft.com>
Why I did it
When enough neighbors exit for a switch (>211), LLDP will stop reporting the last few neighbors after swss restarts. This happens even though the tcp packets from those neighbors are still arriving over the ports. This is because the netlink buffer is filled up when the restart occurs and the netlink delete messages are not received by lldp, this leads to the service not including the new neighbors over those ports, which will persist until lldp is restarted:

How I did it
By increasing both the minimum and maximum netlink receive buffer in LLDP I was able to fix this issue

How to verify it
Run:

show lldp table
systemctl restart swss

Check without the change to see if the lldp buffer cannot handle all of the neighbors, then install a version of Sonic with the change to check that the larger buffer fixes this issue.

Signed-off-by: Connor Roos <croos@nvidia.com>
)

gateway is optional field. In smartswitch use case, dhcp server runs in the switch and is used to offer addresses to DPU midplane. But we don't want to install default route over the midplane interface so gateway option is removed from DHCP server config. See sonic-net#24495 (comment).

Signed-off-by: Yue Gao <yuega2@cisco.com>
…PE-B-O128 and update comments (sonic-net#24774)

Why I did it
This PR is addressing a comment in sonic-net#24578 by correctly set sai_stats_support_mask for Arista-7060X6-64PE-B-O128.

Note that bit 7 is not needed for Arista-7060X6-16PE-384C-B-O128S2 as srv6 is not supported for this platform. I just updated the comment for clarity.

Signed-off-by: kewei <kewei@arista.com>
Co-authored-by: Vineet  Mittal <46945843+vmittal-msft@users.noreply.github.com>
Using a symlink to point to the buffer and pool values for LT2 role QoS.
This will make sure any fixes for LT2 will be carried over.
The same bcm config file with applied QoSO settings is already being used.

Signed-off-by: Dakota Crozier <dakotac@arista.com>
…omatically (sonic-net#24792)

#### Why I did it
src/sonic-swss-common
```
* 03c6777 - (HEAD -> master, origin/master, origin/HEAD) Opportunistically use a matching kernel version for building the modules (sonic-net#1138) (2 days ago) [Saikrishna Arcot]
* 41acbf8 - Ensure RedisPipeline dtor not to throw (sonic-net#1115) (5 days ago) [Qi Luo]
* e8a9592 - Reapply "Fix risky unwrap(), expect(), and casting (sonic-net#1113)" (sonic-net#1118) (sonic-net#1124) (5 days ago) [Qi Luo]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…tically (sonic-net#24791)

#### Why I did it
src/sonic-sairedis
```
* 1bf99980 - (HEAD -> master, origin/master, origin/HEAD) changes for vpp release 202510 (sonic-net#1695) (3 days ago) [aronovic]
* 0ecbb5b9 - Revert "[meta] do not fail bulk operations if MODE_IGNORE_ERROR (sonic-net#1613)" (sonic-net#1676) (3 days ago) [Nikola Dancejic]
* 3ab2f07d - Graceful shutdown vpp to avoid core dump (sonic-net#1714) (4 days ago) [yue-fred-gao]
* a5e0e632 - Revert "Temp workaround for sonic-buildimage issue 23387 (sonic-net#1629)" (sonic-net#1700) (4 days ago) [Ze Gan]
* b735b473 - Use sonic build pool for building sairedis (sonic-net#1719) (5 days ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Co-authored-by: Hemanth Kumar Tirupati <tirupatihemanthkumar@gmail.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Update the following to the version present in Trixie:
* bash to 5.2.37-2
* kdump-tools to 1.10.7
* openssh to 10.0p1-7
* snmpd to 5.9.4-2

Co-authored-by: Hua Liu <58683130+liuh-80@users.noreply.github.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
The version of systemd on Trixie no longer allows service generators to
write to directories outside of what has been explicitly passed in. This
affects DPU and multi-ASIC use cases. Therefore, rework
systemd-sonic-generator to meet these requirements.

Also, compile systemd-sonic-generator with C++17. The gtest headers no
longer support C++11, so it needs to be bumped up to C++14 at minimum.

Also, move logs for systemd-sonic-generator into /dev/kmsg (#34)

Co-authored-by: Hemanth Kumar Tirupati <tirupatihemanthkumar@gmail.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Co-authored-by: Hemanth Kumar Tirupati <tirupatihemanthkumar@gmail.com>
Co-authored-by: Brad House - NextHop <bhouse@nexthop.ai>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Loosen the help text check in `test_cfghelp.py` in `sonic-yang-mgmt`.
The exact text might change from one Python version to another, and help
text itself is more for use by a human rather than a machine. It's
better to check that the expected elements of the help text (something
about the options that are expected and the descriptions) are there
rather than the exact formatting.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
There is one test that is failing for unclear reasons.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Co-authored-by: Yan Markman <ymarkman@marvell.com>
Co-authored-by: Hemanth Kumar Tirupati <tirupatihemanthkumar@gmail.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
With the base image upgrade to Trixie, Bookworm-based containers will
need to use Boost 1.83. This is because of an incompatibility between
rsyslog_plugin that uses Boost 1.83 on Trixie and the eventd
container that uses Boost 1.74 on Bookworm; specifically there is an
incompatiblity with serialization of objects between the two versions of
Boost.

Because of this, for Bookworm, use Boost 1.83 instead of the default to
the default Boost 1.74.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
With the trixie upgrade, all of the package versions for the base image
will have changed, meaning the version control files will not be useful
at all for the base image.

Take this opportunity to recreate all of the version files (including
the ones for the containers).

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Install and use the pam_system module, where systemd creates a user
session manager for each user that logs in. This is now required for
limiting login sessions, but brings in some advantages of cgroups
limiting each user's resources and some resource isolation from the main
sshd service.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Fix FIPS build issue on trixie

* Update sonic-fips.mk

---------

Co-authored-by: sonicbld <sonicbld@microsoft.com>
wumiaont and others added 27 commits January 28, 2026 09:36
* Add cli command to display macsec fips module

* Pull latest. Signed-off-by: Wu Miao<wumiao@nokia.com>
Signed-off-by: Tejaswini Chadaga <tchadaga@microsoft.com>
Co-authored-by: Vineet  Mittal <46945843+vmittal-msft@users.noreply.github.com>
…onic-net#25168)

- Why I did it
To include latest fixes and new functionality

- How I did it
SDK_VERSION_DPU 25.10-RC5 -> 26.1-RC1
FW_VERSION 47.1080 -> 48.0318
SAI_VERSION SAIBuild0.0.47.0 -> SAIBuild0.0.48.0

- How to verify it
Build an image and run tests from "sonic-mgmt".

Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>
…25031)

- Why I did it
The thermal updater's load_tc_config method was unable to correctly parse tc_config files that use regex-based parameter.

- How I did it
Added _find_matching_key method: Implements regex pattern matching to find parameter keys in dictionary.
Updated load_tc_config method:
Changed ASIC parameter lookup to use regex pattern r'asic\\d*'
Changed Module parameter lookup to use regex pattern r'module\\d+'

- How to verify it
Test in switch, check if corrected parsed.

Signed-off-by: Jianyue Wu <jianyuew@nvidia.com>
This list is designed to record the previously found leak sensor. It can not be clear otherwise we will loose track.

- Why I did it
resolve the leaking sensor recovery message can't be sent

- How I did it
remove the leaking sensor list from reset function so we always have a track for the previous found leaking sensors, this attribute life cycle should be same as the hardware checker object.

Signed-off-by: Yuanzhe Liu <yualiu@nvidia.com>
sonic-net#25079)

This reverts commit ee76ce5.

- Why I did it
As the original PR was created as a workaround for a driver issue, This commit is reverted, since now it is not requried anymore. This revert will change the behaviour during system reboot for DPUs dpus will start the startup process and then we proceed with switch reboot

- How I did it
Revert commit

- How to verify it
Execute reboot command

Signed-off-by: gpunathilell <gpunathilell@nvidia.com>
Signed-off-by: Boyang Yu <byu@arista.com>
…apable (sonic-net#25153)

Why I did it
To enable macsec on UpperSpineRouters when the device is capable at init time

How I did it
Modify init_cfg.json to enable macsec on UpperSpineRouter when platform is macsec capable

How to verify it
Macsec feature is enabled on supported device if it is a SpineRouter or UpperSpineRouter

Signed-off-by: Tejaswini Chadaga <tchadaga@microsoft.com>
Co-authored-by: Vineet  Mittal <46945843+vmittal-msft@users.noreply.github.com>
Why I did it
Updated Agera2 gearbox SW version to 3.14.0-2 to include certain fixes.
…D automatically (sonic-net#25247)

#### Why I did it
src/sonic-platform-daemons
```
* cb899f1 - (HEAD -> master, origin/master, origin/HEAD) Stop the `config_manager` child process on exception in `chassisd` (sonic-net#727) (9 hours ago) [arista-nwolfe]
* f0d0f27 - Create handle_cmis_inserted_state function for CMIS_STATE_INSERTED (sonic-net#738) (2 days ago) [Bobby McGonigle]
* 282525b - xcvrd: Support regex matching against the medium lane speed key (sonic-net#731) (2 days ago) [Brian Gallagher]
* 8b81505 - [stormond] Refactor daemon to reuse CONFIG_DB connection (sonic-net#730) (5 days ago) [gechiang]
* dabe1f3 - Update the host lane mask for decommissioning based on the module capability (sonic-net#733) (6 days ago) [arpit-nexthop]
```
#### How I did it
#### How to verify it
#### Description for the changelog
… automatically (sonic-net#25245)

#### Why I did it
src/sonic-platform-common
```
* 367d000 - (HEAD -> master, origin/master, origin/HEAD) Fix log prefix hijack in c_cmis.py (sonic-net#609) (4 days ago) [nate-nexthop]
* a5149fb - [BMC] make bmc-fw-update.py common (sonic-net#617) (7 days ago) [Ben Levi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Include the Mellanox firmware manager Python wheel in the KVM image
build to enable firmware management capabilities in KVM deployments.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
- sonic-sairedis
- sonic-utilities
- sonic-utilities
@noaOrMlnx noaOrMlnx force-pushed the multi-asic-ready-signal-noaor branch from 629a781 to 2c95c20 Compare February 9, 2026 07:05
noaOrMlnx pushed a commit that referenced this pull request Feb 18, 2026
…ly (sonic-net#25536)

#### Why I did it
src/sonic-ztp
```
* 170acb0 - (HEAD -> master, origin/master, origin/HEAD) Skip ZTP service during warm boot (#71) (5 hours ago) [Ying Xie]
```
#### How I did it
#### How to verify it
#### Description for the changelog
@noaOrMlnx noaOrMlnx closed this Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.