Skip to content

[Nvidia] ASIC Firmware update control#5

Closed
fastiuk wants to merge 2 commits intomasterfrom
firmware-update-control
Closed

[Nvidia] ASIC Firmware update control#5
fastiuk wants to merge 2 commits intomasterfrom
firmware-update-control

Conversation

@fastiuk
Copy link
Copy Markdown
Owner

@fastiuk fastiuk commented Jul 3, 2022

This PR depends on fastiuk/sonic-swss-common#3

Why I did it

  • Add an ability to control ASIC FW update
    • Ability to enable/disable FW update
    • Ability to choose FW source: image/user-defined

How to verify it

Build SONiC
or
When SONiC was already built:

rm -f target/python-wheels/bullseye/sonic_yang_models-1.0-py3-none-any.whl
BLDENV=bullseye make -f Makefile.work target/python-wheels/bullseye/sonic_yang_models-1.0-py3-none-any.whl
rm -f target/sonic-mellanox.bin
make target/sonic-mellanox.bin

Login into the switch.
Disable FW auto-update:

redis-cli -n 4 HSET "FIRMWARE|asic" "auto_update" "disable"
sudo reboot

See the results:
Screenshot 2022-07-03 at 02 20 43
Enable auto-update, change source to user-defined and copy FW image in place:

redis-cli -n 4 HSET "FIRMWARE|asic" "auto_update" "enable"
redis-cli -n 4 HSET "FIRMWARE|asic" "default" "user"
sudo mkdir -p /host/mlnx/asic/
sudo cp /tmp/fw-SPC2-rel-29_2010_2270-EVB.mfa /host/mlnx/asic/fw.mfa
sudo reboot

Screenshot 2022-07-03 at 03 42 37

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

  • Add an ability to control ASIC FW update

Link to config_db schema for YANG module changes

sonic-firmware.yang

Submodules PR's :

Repo PR title State
sonic-swss-common Firmware update control GitHub issue/pull request detail

A picture of a cute animal

1-46

@fastiuk fastiuk changed the title Firmware update control [Nvidia] Firmware update control Jul 3, 2022
@fastiuk
Copy link
Copy Markdown
Owner Author

fastiuk commented Jul 5, 2022

@stepanblyschak please review

Copy link
Copy Markdown

@stepanblyschak stepanblyschak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what is the real world use case for the end user. We know that we must have FW version and SDK version aligned. Without a mechanism to upgrade SDK and/or a mechanism to ensure user provided FW is guarateed to work with installed SDK version we create a possibility break the system easily.
If the intent is to allow user have a fix in new FW without waiting for the new SONiC image we can tell the user to replace the FW file under /etc and perform reboot to achive it.
I am also not sure why do we need to control "auto_update". We must upgrade FW prior to cold/fast/warm upgrade. If the user configuration disables it we should clearly let user know that these flows will be broken.

@fastiuk
Copy link
Copy Markdown
Owner Author

fastiuk commented Oct 25, 2022

I wonder what is the real world use case for the end user.

It will allow users to control ASIC firmware and install their own images.

We know that we must have FW version and SDK version aligned. Without a mechanism to upgrade SDK and/or a mechanism to ensure user provided FW is guarateed to work with installed SDK version we create a possibility break the system easily.

  1. I am not sure we must have versions completely aligned. I think on some systems we can have a little deviation between versions at least minor ones.
  2. We don't have that mechanism to guarantee that new FW will work with a current SDK, so it will be user-responsibility to use proper FW.

If the intent is to allow user have a fix in new FW without waiting for the new SONiC image we can tell the user to replace the FW file under /etc and perform reboot to achive it.

We can but it is not a good user experience. In that case, the user will remove the old FW and won't be able to get it back (iow: it is not straightforward). That feature allows user to choose which FW to use.

I am also not sure why do we need to control "auto_update". We must upgrade FW prior to cold/fast/warm upgrade. If the user configuration disables it we should clearly let user know that these flows will be broken.

That requirement came from another Nvidia OS which allows doing so.
From user perspective it will allow user to install FW, but not install it right now. Only install when user enable auto-update.

* Add YANG model for new ConfigDB table
* Covered with tests

Signed-off-by: Yevhen Fastiuk <[email protected]>
* Add an ability to enable/disable FW auto-update.
* Add an ability to set firmware source like:
image, user. User defined fw image location should
be the next: /host/mlnx/asic/fw.mfa

Signed-off-by: Yevhen Fastiuk <[email protected]>
@fastiuk fastiuk force-pushed the firmware-update-control branch from a923919 to 9f767a7 Compare October 25, 2022 10:24

if [ "${IMAGE_UPGRADE}" != "${YES_PARAM}" ]; then
UpgradeFW
DEFAULT_SOURCE="$(sonic-cfggen -d -v FIRMWARE[\'asic\'][\'default\'])"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please make sure that we are able to perform reboot when database service is down or config db is not initialized

@stepanblyschak
Copy link
Copy Markdown

I wonder what is the real world use case for the end user.

It will allow users to control ASIC firmware and install their own images.

We know that we must have FW version and SDK version aligned. Without a mechanism to upgrade SDK and/or a mechanism to ensure user provided FW is guarateed to work with installed SDK version we create a possibility break the system easily.

  1. I am not sure we must have versions completely aligned. I think on some systems we can have a little deviation between versions at least minor ones.
  2. We don't have that mechanism to guarantee that new FW will work with a current SDK, so it will be user-responsibility to use proper FW.

If the intent is to allow user have a fix in new FW without waiting for the new SONiC image we can tell the user to replace the FW file under /etc and perform reboot to achive it.

We can but it is not a good user experience. In that case, the user will remove the old FW and won't be able to get it back (iow: it is not straightforward). That feature allows user to choose which FW to use.

I am also not sure why do we need to control "auto_update". We must upgrade FW prior to cold/fast/warm upgrade. If the user configuration disables it we should clearly let user know that these flows will be broken.

That requirement came from another Nvidia OS which allows doing so. From user perspective it will allow user to install FW, but not install it right now. Only install when user enable auto-update.

@fastiuk I suggest to contact @nazariig and understand how this effort relates to FW util and ask to review as he's the code owner for fw upgrade script.

I would also suggest to give a confirmation promt/warning to the user when he does warm/fast reboot when "auto_update" is disabled.

@fastiuk fastiuk changed the title [Nvidia] Firmware update control [Nvidia] ASIC Firmware update control Nov 8, 2022
fastiuk pushed a commit that referenced this pull request Nov 20, 2022
fastiuk pushed a commit that referenced this pull request Jan 4, 2023
Added below commits:
9b30690 jcaiMR Fri Dec 16 fix handleSwssNotification crash in dhcp6relay (sonic-net#28)
047afb7 jcaiMR Wed Dec 14 14:08:58 2022 +0800 Fix multiple vlan issue (sonic-net#27)
ff6bec3 Vivek Thu Dec 8 09:44:15 2022 -0800 Made the Error log informative (sonic-net#22)
2fbe729 jcaiMR Wed Nov 30 14:41:53 2022 +0800 disable cfg dynamic change (sonic-net#25)
13d0805 Liu Shilong Wed Nov 30 10:54:11 2022 +0800 Use github code scanning instead of LGTM (sonic-net#26)
1e846f6 kellyyeh Wed Nov 23 14:36:02 2022 -0800 Fix packet range check for relay-reply packets (sonic-net#21)
4d19e13 kellyyeh Thu Nov 17 10:04:53 2022 -0800 Add unittest infrastructure (#5)
7f4fdab jcaiMR Fri Nov 11 14:47:51 2022 +0800 fix packet range check issue (sonic-net#20)
257ecdf kellyyeh Thu Nov 3 11:34:11 2022 -0700 Add client packet UDP header length check (sonic-net#19)
@fastiuk fastiuk closed this Mar 21, 2023
@fastiuk fastiuk deleted the firmware-update-control branch May 23, 2023 22:19
fastiuk pushed a commit that referenced this pull request Dec 23, 2024
…et#21095)

Adding the below fix from FRR FRRouting/frr#17297

This is to fix the following crash which is a statistical issue

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))]
(gdb) bt
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678
#4  0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352
#5  0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258
#6  route_next (node=<optimized out>) at ../lib/table.c:436
#7  route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410
#8  0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020")
    at ../zebra/interface.c:312
#9  0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867
#10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221
#11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810
#12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990
#13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198
sonic-net#14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants