Merge master to VY/master#18
Merged
VadymYashchenko merged 51 commits intoVadymYashchenko:masterfrom Apr 29, 2022
Merged
Conversation
Why I did it To support address sanitizer for Mellanox syncd How I did it /var/log/asan is mapped for syncd container (the same as for swss) container stop() has a timeout (60s) for syncd (the same as for swss) This is so libasan has enough time to generate a report. added ASAN's log path to Mellanox syncd supervisord.conf added "asan: yes" to sonic_version.yml How to verify it Added artificial memory leaks Compiled with ENABLE_ASAN=y Installed the image on DUT Rebooted the DUT Verified that /var/log/asan/syncd-asan.log contains the leaks Signed-off-by: Yakiv Huryk <[email protected]>
Updating sonic-utilities sub module with the following commits f09bd31 Fix UT failed cause by change pycommon to use swsscommon c092300 Increased pcied unit test coverage to > 80% 7d7c85e Modular chassis: Psud set master led on first run 7195dcc Remove py2 from pipeline c2e7393 [ycabled] increase UT coverage of ycabled daemon #### Why I did it When change pycommon to use swsscommon UT failed in sonic-platform-daemon, need submodule update with UT issue fix. #### How I did it #### How to verify it #### Which release branch to backport (provide reason below if selected) #### Description for the changelog Fix UT failed cause by change pycommon to use swsscommon Increased pcied unit test coverage to > 80% Modular chassis: Psud set master led on first run Remove py2 from pipeline [ycabled] increase UT coverage of ycabled daemon #### A picture of a cute animal (not mandatory but encouraged)
The v0.7.5 has bug fix for the support of gearbox port and macsec counters. It also includes a owl firmware update with owl.lz4.fw.1.94.0.bin. How I did it Update credo sai url for v0.7.5 Update gearbox_config.json with using firmware owl.lz4.fw.1.94.0.bin instead of owl.lz4.fw.1.92.1.bin How to verify it Test gearbox port and macsec counter successfully on A7280.
…BackEndToRRouter (#10474)
Signed-off-by: Taras Keryk <[email protected]>
Following the patch from : https://packages.debian.org/bullseye/wpasupplicant, to upgrade sonic-wpa-supplicant for supporting bullseye and upgrade docker-macsec.mk as a bullseye component.
[master][sonic-linkmgrd] submodule updates 41f5fb9 Jing Zhang Mon Apr 11 08:33:39 2022 -0700 Upgrade linkmgrd to `BULLSEYE` (sonic-net/sonic-linkmgrd#60) 2fc890e Jing Zhang Mon Apr 4 10:25:22 2022 -0700 Lower unsolicited MUX state change notification log level to WARNING (sonic-net/sonic-linkmgrd#57) 13f4879 Jing Zhang Sun Apr 3 21:56:33 2022 -0700 Keep incrementing sequence number when link prober is suspended and shutdown (sonic-net/sonic-linkmgrd#55) 62482e1 Jing Zhang Sun Apr 3 20:54:40 2022 -0700 Reset link prober state when default route is back (sonic-net/sonic-linkmgrd#56) 34a68d1 Jing Zhang Thu Mar 31 18:33:46 2022 -0700 disable switchover measuring based on link prober (sonic-net/sonic-linkmgrd#49) 898a655 Jing Zhang Thu Mar 31 15:42:15 2022 -0700 Update link prober metrics posting logics (sonic-net/sonic-linkmgrd#50) sign-off: Jing Zhang [email protected]
Add Yang model to constrain the configuration of MACsec
- Why I did it Implement newly added reboot causes in PR sonic-net/sonic-platform-common#277 - How I did it Map the reboot cause sysfs to the newly added reboot causes. - How to verify it manual test, check whether the reboot cause is correct after rebooting the switch in various ways. run the community reboot test to see whether the reboot cause checker is passing. Signed-off-by: Kebo Liu <[email protected]>
…D3 based platforms (#10587)
Why I did it To sign SONiC kernel image and allow secure boot based system to verify SONiC image before loading into the system. How I did it Pass following parameter to rules/config.user Ex: SONIC_ENABLE_SECUREBOOT_SIGNATURE := y SIGNING_KEY := /path/to/key/private.key SIGNING_CERT := /path/to/public/public.cert How to verify it Secure boot enabled system enrolled with right public key of the, image in the platform UEFI database will able to verify image before load. Alternatively one can verify with offline sbsign tool as below. export SBSIGN_KEY=/abc/bcd/xyz/ sbverify --cert $SBSIGN_KEY/public_cert.cert fsroot-platform-XYZ/boot/vmlinuz-5.10.0-8-2-amd64 mage O/P: Signature verification OK
* [build]: Patch debootstrap to not unmount the host's /proc filesystem Currently, when the final image is being built (sonic-vs.img.gz, sonic-broadcom.bin, or similar), each invocation of sudo in the build_debian.sh script takes 0.8 seconds to run and execute the actual command. This is because the /proc filesystem in the slave container has been unmounted somehow. This is happening when debootstrap is running, and it incorrectly unmounts the host's (in our case, the slave container's) /proc filesystem because in the new image being built, /proc is a symlink to the host's (the slave container's) /proc. Because of that, /proc is gone, and each invocation of sudo adds 0.8 seconds overhead. As a side effect, docker exec into the slave container during this time will fail, because /proc/self/fd doesn't exist anymore, and docker exec assumes that that exists. Debootstrap has fixed this in 1.0.124 and newer, so backport the patch that fixes this into the version that Bullseye has. Signed-off-by: Saikrishna Arcot <[email protected]> * [build_debian.sh]: Use eatmydata to speed up deb package installations During package installations, dpkg calls fsync multiples times (for each package) to ensure that tht efiles are written to disk, so that if there's some system crash during package installation, then it is in at least a somewhat recoverable state. For our use case though, we're installing packages in a chroot in fsroot-* from a slave container and then packaging it into an image. If there were a system crash (or even if docker crashed), the fsroot-* directory would first be removed, and the process would get restarted. This means that the fsync calls aren't really needed for our use case. The eatmydata package includes a library that will block/suppress the use of fsync (and similar) system calls from applications and will instead just return success, so that the application is not blocked on disk writes, which can instead happen in the background instead as necessary. If dpkg is run with this library, then the fsync calls that it does will have no effect. Therefore, install the eatmydata package at the beginning of build_debian.sh and have dpkg be run under eatmydata for almost all package installations/removals. At the end of the installation, remove it, so that the final image uses dpkg as normal. In my testing, this saves about 2-3 minutes from the image build time. Signed-off-by: Saikrishna Arcot <[email protected]> * Change ln syntax to use chroot Signed-off-by: Saikrishna Arcot <[email protected]>
sign-off: Jing Zhang [email protected] #### Why I did it As part of the process moving containers from buster to bullseye. #### How I did it 1. change base image from buster to bullseye. 2. remove unused addition to orchagent run options #### How to verify it Tested building locally.
…ber (#8927) #### Why I did it Fix several bugs: 1. If one vlan member belongs to multiple vlans, and if any of the vlans is "Tagged" type, we respect the tagged type 2. If one vlan member belongs to multiple vlans, and all of the vlans have no "Tagged" type, we override it to be a tagged member 3. make sure `vlantype_name` is assigned correctly in each iteration #### How to verify it 1. Test the command line to parse a minigraph and make sure the output does not change. ``` ./sonic-cfggen -m minigraph.mlnx20.xml ``` The minigraph is for HwSKU Mellanox-SN2700-D40C8S8. 2. Test on a DUT with HwSKU Mellanox-SN2700-D40C8S8 ``` sudo config load_minigraph show vlan brief ``` Checked the "Port Tagging" column in the output.
Signed-off-by: bingwang <[email protected]>
The interface renaming logic fails if one interface is missing. Because of the `set -e` the whole initramfs hook would abort early on error. This change fixes the current behavior to make sure missing interfaces are properly skipped and ensure existing interface are renamed.
Swss Commit update: 1fd1dbf Add support for route flow counter (#2094) d8fadc6 [QoS] Resolve an issue in the sequence where a referenced object removed and then the referencing object deleting and then re-adding (#2210) eaf7264 [macsecorch]: MACsec with pfc (#2095) a32b611 [azp]: Reduce diff coverage to 50% threshhold (#2227) 6301db7 [Code owner] Set owners for auto reviews (#2229) d1fb3dd [BFD]Retry create BFD with different source UDP port on failure (#2225) 53620f3 [orchagent] add & remove port counters dynamically each time port was added or removed (#2019) cf216be Change ERR to Notice for tunnel term create fail (#2219)
…0618) Why I did it Missing the dependency of macsecmgrd in swss so that the MACsec feature cannot be enabled. How I did it Add SWSS dependency in docker-macsec.mk How to verify it Check the Azp of sonic-mgmt
…10288) Signed-off-by: Yong Zhao <[email protected]> Why I did it This PR aims to fix the Monit issue which shows Monit can't reset its counter when monitoring memory usage of telemetry container. Specifically the Monit configuration file related to monitoring memory usage of telemetry container is as following: check program container_memory_telemetry with path "/usr/bin/memory_checker telemetry 419430400" if status == 3 for 10 times within 20 cycles then exec "/usr/bin/restart_service telemetry" If memory usage of telemetry container is larger than 400MB for 10 times within 20 cycles (minutes), then it will be restarted. Recently we observed, after telemetry container was restarted, its memory usage continuously increased from 400MB to 11GB within 1 hour, but it was not restarted anymore during this 1 hour sliding window. The reason is Monit can't reset its counter to count again and Monit can reset its counter if and only if the status of monitored service was changed from Status failed to Status ok. However, during this 1 hour sliding window, the status of monitored service was not changed from Status failed to Status ok. Currently for each service monitored by Monit, there will be an entry showing the monitoring status, monitoring mode etc. For example, the following output from command sudo monit status shows the status of monitored service to monitor memory usage of telemetry: Program 'container_memory_telemetry' status Status ok monitoring status Monitored monitoring mode active on reboot start last exit value 0 last output - data collected Sat, 19 Mar 2022 19:56:26 Every 1 minute, Monit will run the script to check the memory usage of telemetry and update the counter if memory usage is larger than 400MB. If Monit checked the counter and found memory usage of telemetry is larger than 400MB for 10 times within 20 minutes, then telemetry container was restarted. Following is an example status of monitored service: Program 'container_memory_telemetry' status Status failed monitoring status Monitored monitoring mode active on reboot start last exit value 0 last output - data collected Tue, 01 Feb 2022 22:52:55 After telemetry container was restarted. we found memory usage of telemetry increased rapidly from around 100MB to more than 400MB during 1 minute and status of monitored service did not have a chance to be changed from Status failed to Status ok. How I did it In order to provide a workaround for this issue, Monit recently introduced another syntax format repeat every <n> cycles related to exec. This new syntax format will enable Monit repeat executing the background script if the error persists for a given number of cycles. How to verify it I verified this change on lab device str-s6000-acs-12. Another pytest PR (sonic-net/sonic-mgmt#5492) is submitted in sonic-mgmt repo for review.
Why I did it [Ci]: Support to sign image for cisco-8000 uefi secure boot
Update PikeZ platform definition Improve powercycle behavior on chassis
#10555) * [CG-Fix-CVE-2021-44906] Patching on thrift.0.14.1 for package minimist Signed-off-by: richardyu-ms <[email protected]> * add more information in patch Signed-off-by: richardyu-ms <[email protected]> * Update 0003-Remove-minimist-packages.patch * change the thrift 0.14.1 to package download Signed-off-by: richardyu-ms <[email protected]> * use the series file for patching * fix a code defect
- Why I did it InvalidPsuVolWA.run might raise exception if user power off PSU when it is running. This exception is not caught and will be raised to psud which causes psud failed to update PSU data to DB. - How I did it 1. Change the log level when WA does not work. This could happen when user power off PSU, hence changing the log level from error to warning is better 2. Change the wait time from 5 to 1 to avoid introduce too much delay in psud. 1 second is usually enough per my test 3. Give a default return value for function get_voltage_low_threshold and get_voltage_high_threshold to avoid exception reach to psud - How to verify it Manual test. Run sonic-mgmt regression
- Why I did it To support docker-sonic-vs image with ASAN. - How I did it 1. Made the supervisord.conf a template 2. Added the 'log_path' environment variable for ASAN-enabled daemons 3. Added supervisord.conf.j2 generation and ASAN lib to the docker-sonic-vs/Dockerfile.j2 - How to verify it 1. Made a build with ENABLE_ASAN=y 2. Run the tests, checked ASAN reports Signed-off-by: Yakiv Huryk <[email protected]>
…#10313) * Optimize dx010 sonic platform init script to speed up init process * Merge issue #10152: [warm-upgrade][202012] Slow Celestica platform init in rc.local causes lacp-teardown fix into master branch Signed-off-by: Eric Zhu <[email protected]>
Why I did it Provide fix for comment: https://github.com/Azure/sonic-buildimage/pull/10475/files#r847753187; How I did it Try exception is not required in this scenario, so remove and modify to initial db config according to single or multi-asic platforms. How to verify it Verified on multi-asic device.
Why I did it Allow portchannel vlan sub intf long name format as long as it follows Linux interface name length limit(<16). How I did it Modify the leaf name check. How to verify it Test case passes.
…0565) * Add AZURE_TUNNEL map Signed-off-by: bingwang <[email protected]>
Co-authored-by: Zhi Yuan (Carl) Zhao <[email protected]>
submodule update, includes: ec32690 CVE-2020-25614: Update xmlquery, jsonquery and xpath packages. (#58) 5156527 Showtech sonic mgmt framework: Add Management Framework functionality for "show tech-support" (#49)
) * Remove SSH host keys after installing the custom version of sshd Signed-off-by: Saikrishna Arcot <[email protected]> * Use an override for for sshd instead of overwriting the service file Don't overwrite upstream's .service file, and instead use an override file for making sure the host key(s) are generated. Signed-off-by: Saikrishna Arcot <[email protected]>
…10627) On vs platform, egress_lossless_pool's mode is static. So the corresponding profile should be of static_th as well. Signed-off-by: Stephen Sun <[email protected]>
… CONFIG_DB (#9681) Asic PCI ID (PCI address) is collected by chassisd (inside pmon - sonic-net/sonic-platform-daemons#175) and saved in CHASSIS_STATE_DB (in redis_chassis). CHASSIS_STATE_DB is accessible by swss containers. At docker-init.sh (script is called after swss container is created and before anything that could run in swss like orchagent...), we wait until asic PCI ID of the corresponding asic is populated by chassisd. We then update asic_id in CONFIG_DB of asic's database. A system supporting dynamic asic PCI ID identification requires to have a file (empty) use_pci_id_chassis in its platform dir. When orchagent runs, it has correct asic PCI ID in its CONFIG_DB. Together with this PR: sonic-net/sonic-platform-daemons#175 sonic-net/sonic-platform-common#185 Signed-off-by: Maxime Lorrillere <[email protected]> Co-authored-by: Maxime Lorrillere <[email protected]>
Why I did it Recirc port is used to only forward traffic from one asic to another asic. So it's not required to configure LLDP on it. How I did it Add interface prefix helper for recirc port. Similar to skip configuring LLDP on inband port, add check in lldpmgrd to skip recirc port by checking interface prefix.
Why I did it Migrate ptftests script to python3, in order to do an incremental migration, add python virtual environment firstly, install all required python packages in virtual env as well. Then migrate ptftests scripts from python2 to python3 one by one avoid impacting non-changed scripts. Signed-off-by: Zhaohui Sun [email protected] How I did it Add python3 virtual environment for docker-ptf. Add submodule ptf-py3 and install patched ptf 0.9.3 into virtual environment as well, two ptf issues were reported here: p4lang/ptf#173 p4lang/ptf#174 Signed-off-by: Zhaohui Sun <[email protected]>
Why I did it Update submodule sonic-restapi e83e0e8 Fix Ctype_char larger than address space issue in 32-bit armhf (#107)
Why I did it Can not start sonic-hostservice How I did it Install python3-dbus and systemd-python, and replace invalid path How to verify it Start the service with below commands: sudo systemctl start sonic-hostservice sudo systemctl status sonic-hostservice Signed-off-by: Gang Lv [email protected]
* Upgrade docker version from 20.10.7 to 20.10.14, and pin containerd.io Update the Docker engine version from 20.10.7 to 20.10.14. This brings in some CVE and bug fixes. Additionally, pin the version of containerd.io to a specific version, mainly for consistency/reproducibility. Signed-off-by: Saikrishna Arcot <[email protected]> * Remove the containerd ordering change to docker.service This appears to be already present in the current docker.service. Signed-off-by: Saikrishna Arcot <[email protected]> * Remove use of apt-key apt-key is considered deprecated, and the current practice is to just add the key into /etc/apt/trusted.gpg.d/. Signed-off-by: Saikrishna Arcot <[email protected]> * Upgrade docker container in Bullseye slave to 20.10.14 Signed-off-by: Saikrishna Arcot <[email protected]>
Currently, the build dockers are created as a user dockers(docker-base-stretch-<user>, etc) that are
specific to each user. But the sonic dockers (docker-database, docker-swss, etc) are
created with a fixed docker name and common to all the users.
docker-database:latest
docker-swss:latest
When multiple builds are triggered on the same build server that creates parallel building issue because
all the build jobs are trying to create the same docker with latest tag.
This happens only when sonic dockers are built using native host dockerd for sonic docker image creation.
This patch creates all sonic dockers as user sonic dockers and then, while
saving and loading the user sonic dockers, it rename the user sonic
dockers into correct sonic dockers with tag as latest.
docker-database:latest <== SAVE/LOAD ==> docker-database-<user>:tag
The user sonic docker names are derived from 'DOCKER_USERNAME and DOCKER_USERTAG' make env
variable and using Jinja template, it replaces the FROM docker name with correct user sonic docker name for
loading and saving the docker image.
- Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for LLDP service. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time. This PR is dependent on PR: #10567
Why I did it Fix target target/debs/bullseye/sonic-rest-api_1.0.1_arm64.deb not existing issue, the correct target is target/debs/bullseye/sonic-rest-api_1.0.1_armhf.deb. Fix issue: #9896 [ FAIL LOG START ] [ target/debs/stretch/sonic-rest-api_1.0.1_amd64.deb ] [ REASON ] : target/debs/stretch/sonic-rest-api_1.0.1_amd64.deb does not exist NON-EXISTENT PREREQUISITES: [ FLAGS FILE ] : []
#### Why I did it Adding exceptlionList to validation exception #### How I did it Check code. #### How to verify it Ran manually. - Run full config validation from a KVM - Print the thrown exception **Before** ``` Error: Data Loading Failed All Keys are not parsed in FEATURE dict_keys(['telemetry']) ``` **After** ``` Error: Data Loading Failed All Keys are not parsed in FEATURE dict_keys(['telemetry']) exceptionList:["'status'"] ``` #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 #### Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: --> #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/SONiC/wiki/Configuration. --> #### A picture of a cute animal (not mandatory but encouraged)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why I did it
How I did it
How to verify it
Which release branch to backport (provide reason below if selected)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)