Skip to content

Secureboot: Production secureboot signing enhancements (ephemeral keys and bugfixes)#23405

Merged
lguohan merged 2 commits intosonic-net:masterfrom
bhouse-nexthop:secureboot
Oct 27, 2025
Merged

Secureboot: Production secureboot signing enhancements (ephemeral keys and bugfixes)#23405
lguohan merged 2 commits intosonic-net:masterfrom
bhouse-nexthop:secureboot

Conversation

@bhouse-nexthop
Copy link
Collaborator

@bhouse-nexthop bhouse-nexthop commented Jul 21, 2025

Why I did it

The current sonic secureboot implementation assumes all assets are signed with the DB Key. There are roughly 3600 kernel modules that get signed in the process.

Organizations may have varying security policy requirements where any signing requests may require signoff by one or more parties, this may simply be infeasible for that many requests.

Given that the DB key may be a long-lived key, and only RSA 2048 is often used or available, it is a security best-practice to sign as few assets as possible with such a key.

This PR is fully backwards compatible with any pre-existing usages. It introduces a new rules/config setting of SECURE_UPGRADE_KERNEL_CAFILE which is a path to a file for all trusted keys to embed into the kernel. If that setting is not specified, it defaults to SECURE_UPGRADE_SIGNING_CERT. There are no other infrastructure changes to support ephemeral keys for kernel module signing. It is up to the organization to generate the ephemeral keys and pass them into their own signing scripts as required.

Dependent PR:

Related PRs:

Fixes #23406

Work item tracking

How I did it

Worked through production signing steps and hit issues when requiring signing approval with our HSM. This was the most elegant solution that required the minimal changes.

This has been fully tested.

How to verify it

Since there are no provided production signing script examples this would require a vendor with an existing implementation to validate. Nexthop has validated this internally.

Which release branch to backport (provide reason below if selected)

Tested branch (Please provide the tested image version)

master as of 20250721

Description for the changelog

Secureboot: Production signing enhancement to allow ephemeral kernel module keys

Link to config_db schema for YANG module changes

N/A

A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Brad House bhouse@nexthop.ai

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@DavidZagury
Copy link
Contributor

DavidZagury commented Jul 31, 2025

@bhouse-nexthop
I think we should separate the bug fixes changes from the new features changes into different pull requests.
This way we could cherry-pick the bug fixes to older branches.

Specifically I want to ask about this change:
SECURE_UPGRADE_PROD_TOOL_ARGS due to quoting issues would not pass more than one argument.

Currently we are able to pass more than one argument, this is just done by adding the apostrophes in the argument assignment.
e.g. something like this works:
make SECURE_UPGRADE_MODE=prod SECURE_UPGRADE_SIGNING_CERT="key_certificate-prod.pem" SECURE_UPGRADE_PROD_SIGNING_TOOL=signing_secure_boot_prod.sh SECURE_UPGRADE_PROD_TOOL_ARGS="' -a aaa -b bbbb '" target/sonic-mellanox.bin

If we have this change, this would not work any more. So while I understand your change, this will break it, and if we will decide to take it - we should cherry-pick it to older branches as well.

@bhouse-nexthop
Copy link
Collaborator Author

@DavidZagury sure, I will work on splitting this up into 2 or more PRs. We also have uncovered some other bugs that we were planning to upstream as well. I think the quoting 'hack' example you gave isn't proper, but I do understand it would technically break the existing integration you have that is aware of the bug and works around it.

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bhouse-nexthop
Copy link
Collaborator Author

@DavidZagury PRs broken out smaller now

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bhouse-nexthop
Copy link
Collaborator Author

@Yarden-Z any comments?

@lguohan
Copy link
Collaborator

lguohan commented Oct 18, 2025

This pull request introduces a new Makefile/config variable: SECURE_UPGRADE_KERNEL_CAFILE, which represents the path to a PEM file containing trusted certificates to be embedded into the kernel during a secure upgrade. Here’s a summary and review of the changes:


Summary of Changes

1. Adds SECURE_UPGRADE_KERNEL_CAFILE variable:

  • Declared in rules/config, documented alongside other secure upgrade variables.
  • Defaults to SECURE_UPGRADE_SIGNING_CERT if not specified:
    SECURE_UPGRADE_KERNEL_CAFILE ?= $(SECURE_UPGRADE_SIGNING_CERT)
  • Variable is now passed and exported through multiple Makefile layers:
    • Mounted as a read-only volume in the Docker run command if set.
    • Propagated in build instructions, installer environment exports, and DEP_FLAGS for kernel builds.

2. Documentation Updates:

  • Adds documentation comments to rules/config describing the purpose and default value of the new variable.

3. Propagation in Build System:

  • Ensures SECURE_UPGRADE_KERNEL_CAFILE is available wherever secure upgrade signing variables are used:
    • Makefile.work, rules/linux-kernel.dep, slave.mk, and related recipes.

Review

Positives

  • Backward Compatibility: The default assignment ensures existing builds will not break, as it falls back to SECURE_UPGRADE_SIGNING_CERT.
  • Clear Documentation: Each usage and export is well-documented, and the new variable is added to comments for clarity.
  • Consistent Propagation: The variable is consistently exported and passed through build stages, reducing risk of missing it in downstream usage.
  • Security-Oriented: Explicit handling of trusted kernel certificates increases flexibility for secure upgrade flows.

Suggestions / Questions

  • Validation: Is there validation elsewhere in the build system to ensure the specified file exists and contains valid PEM certificates?
  • Usage: Is there a specific consumer (script, build step) that now expects this variable? If so, does it handle the fallback properly?
  • Naming Consistency: The variable name is clear, but if other similar certificate variables exist, confirm naming conventions are consistent.
  • Error Handling: If an invalid or missing file is specified, are errors surfaced clearly in the build logs?

Minor

  • Typos in documentation:
    • Double “to” in:
      # SECURE_UPGRADE_KERNEL_CAFILE - path to a file containing trusted certificates in PEM format to to embed into the kernel.
      
      Should be “to embed”.

Conclusion

This PR is well-structured and improves the flexibility of the secure upgrade process. The code changes are straightforward, maintain compatibility, and are clearly documented.
Unless there are downstream issues with consumers of the new variable, this PR looks good to merge.

Let me know if you want a deeper review of the downstream usage or have concerns about specific build flows!

@lguohan
Copy link
Collaborator

lguohan commented Oct 18, 2025

@bhouse-nexthop , can you help to resolve the conflict? I am now using AI to provide assist to the PR review and ultimately speed up the review process.

@bhouse-nexthop
Copy link
Collaborator Author

on it, will have this in a few minutes

@lguohan
Copy link
Collaborator

lguohan commented Oct 18, 2025

also let me know if those review comments are good or not.

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bhouse-nexthop
Copy link
Collaborator Author

@lguohan thanks so much for merging all these! Nice meeting you at OCP as well. I've rebased and fixed the merge conflict.

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bhouse-nexthop
Copy link
Collaborator Author

The AI review is pretty good. It caught a typo of "to to" that I just fixed. Regarding the "questions", it makes valid points and that was indeed handled by sonic-net/sonic-linux-kernel#499 which you already merged.

Thank you!

@bradh352
Copy link
Collaborator

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bradh352
Copy link
Collaborator

rebased to force a rebuild. The test failures were spurious in a completely unrelated realm.

@TafkaMax
Copy link

rebased to force a rebuild. The test failures were spurious in a completely unrelated realm.

Probably needs a rebuild again?

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@bradh352
Copy link
Collaborator

rebased to force a rebuild. The test failures were spurious in a completely unrelated realm.

Probably needs a rebuild again?

ugh ....

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@TafkaMax
Copy link

rebased to force a rebuild. The test failures were spurious in a completely unrelated realm.

Probably needs a rebuild again?

ugh ....

Second time it worked :-)

@bradh352
Copy link
Collaborator

@lguohan this is the last of the secureboot patches, looks like the rebuild succeeded this time. Any chance this can be merged?

@lguohan lguohan merged commit 2b2004a into sonic-net:master Oct 27, 2025
21 checks passed
@lguohan
Copy link
Collaborator

lguohan commented Oct 27, 2025

@bradh352 , i would really appreciated that commit itself has good description as you have on the pr message, in this case I do not need modify commit message during the merge. it would be great if you can ask all nexthop folks to have good PR description every thing when you push changes. it will definitely speed up the merge process.

FengPan-Frank pushed a commit to FengPan-Frank/sonic-buildimage that referenced this pull request Dec 4, 2025
…s and bugfixes) (sonic-net#23405)

#### Why I did it

The current sonic secureboot implementation assumes all assets are signed with the DB Key.  There are roughly 3600 kernel modules that get signed in the process.

Organizations may have varying security policy requirements where any signing requests may require signoff by one or more parties, this may simply be infeasible for that many requests.

Given that the DB key may be a long-lived key, and only RSA 2048 is often used or available, it is a security best-practice to sign as few assets as possible with such a key.

This PR is fully backwards compatible with any pre-existing usages.  It introduces a new `rules/config` setting of `SECURE_UPGRADE_KERNEL_CAFILE` which is a path to a file for all trusted keys to embed into the kernel.  If that setting is not specified, it defaults to `SECURE_UPGRADE_SIGNING_CERT`.  There are no other infrastructure changes to support ephemeral keys for kernel module signing.  It is up to the organization to generate the ephemeral keys and pass them into their own signing scripts as required.

Dependent PR:
 * sonic-net/sonic-linux-kernel#499

Related PRs:
 * sonic-net/sonic-utilities#3989
 * sonic-net#23732
 * sonic-net#23733
 * sonic-net#23734
 * sonic-net#23735
 * sonic-net#23736

Fixes sonic-net#23406

##### Work item tracking

#### How I did it

Worked through production signing steps and hit issues when requiring signing approval with our HSM.  This was the most elegant solution that required the minimal changes.

This has been fully tested.

#### How to verify it

Since there are no provided production signing script examples this would require a vendor with an existing implementation to validate.  Nexthop has validated this internally.

#### Which release branch to backport (provide reason below if selected)

#### Tested branch (Please provide the tested image version)

master as of 20250721

#### Description for the changelog
Secureboot: Production signing enhancement to allow ephemeral kernel module keys

#### Link to config_db schema for YANG module changes
N/A

#### A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Brad House <bhouse@nexthop.ai>
Signed-off-by: Feng Pan <fenpan@microsoft.com>
xwjiang-ms pushed a commit to xwjiang-ms/sonic-buildimage that referenced this pull request Dec 22, 2025
…s and bugfixes) (sonic-net#23405)

#### Why I did it

The current sonic secureboot implementation assumes all assets are signed with the DB Key.  There are roughly 3600 kernel modules that get signed in the process.

Organizations may have varying security policy requirements where any signing requests may require signoff by one or more parties, this may simply be infeasible for that many requests.

Given that the DB key may be a long-lived key, and only RSA 2048 is often used or available, it is a security best-practice to sign as few assets as possible with such a key.

This PR is fully backwards compatible with any pre-existing usages.  It introduces a new `rules/config` setting of `SECURE_UPGRADE_KERNEL_CAFILE` which is a path to a file for all trusted keys to embed into the kernel.  If that setting is not specified, it defaults to `SECURE_UPGRADE_SIGNING_CERT`.  There are no other infrastructure changes to support ephemeral keys for kernel module signing.  It is up to the organization to generate the ephemeral keys and pass them into their own signing scripts as required.

Dependent PR:
 * sonic-net/sonic-linux-kernel#499

Related PRs:
 * sonic-net/sonic-utilities#3989
 * sonic-net#23732
 * sonic-net#23733
 * sonic-net#23734
 * sonic-net#23735
 * sonic-net#23736

Fixes sonic-net#23406

##### Work item tracking

#### How I did it

Worked through production signing steps and hit issues when requiring signing approval with our HSM.  This was the most elegant solution that required the minimal changes.

This has been fully tested.

#### How to verify it

Since there are no provided production signing script examples this would require a vendor with an existing implementation to validate.  Nexthop has validated this internally.

#### Which release branch to backport (provide reason below if selected)

#### Tested branch (Please provide the tested image version)

master as of 20250721

#### Description for the changelog
Secureboot: Production signing enhancement to allow ephemeral kernel module keys

#### Link to config_db schema for YANG module changes
N/A

#### A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Brad House <bhouse@nexthop.ai>
Signed-off-by: xiaweijiang <xiaweijiang@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement: Secureboot production signing of kernel modules using ephemeral keys + bug fixes

6 participants