Skip to content

[KubeSonic] Add docker-gnmi-sidecar for K8s migration#25163

Merged
yxieca merged 8 commits intosonic-net:masterfrom
hdwhdw:feature/gnmi-sidecar
Feb 17, 2026
Merged

[KubeSonic] Add docker-gnmi-sidecar for K8s migration#25163
yxieca merged 8 commits intosonic-net:masterfrom
hdwhdw:feature/gnmi-sidecar

Conversation

@hdwhdw
Copy link
Contributor

@hdwhdw hdwhdw commented Jan 22, 2026

Why I did it

Enable migration of the systemd-managed gnmi container to Kubernetes as part of the KubeSonic initiative. This follows the same sidecar pattern established by docker-telemetry-sidecar (PR #2021 design).

The sidecar container runs alongside the K8s-managed gnmi pod and:

  1. Syncs stub scripts to the host via nsenter
  2. Stops and removes the old systemd-managed docker container
  3. Replaces the systemd service with a K8s-aware stub that delegates lifecycle to kubectl
Work item tracking
  • Microsoft ADO (number only): 36510161

How I did it

Created a new docker-gnmi-sidecar container with:

File Purpose
Dockerfile.j2 Container image definition
supervisord.conf Process supervisor configuration
systemd_stub.py Main sync orchestrator using shared sidecar_common library
systemd_scripts/gnmi.sh Stub script delegating to k8s_pod_control.sh
systemd_scripts/gnmi.service Stub systemd service unit

Files synced to host:

  • gnmi.sh/usr/local/bin/gnmi.sh
  • gnmi.service/lib/systemd/system/gnmi.service
  • container_checker/bin/container_checker
  • k8s_pod_control.sh/usr/share/sonic/scripts/k8s_pod_control.sh

Post-copy actions when gnmi.sh is updated:

docker stop gnmi && docker rm gnmi
systemctl daemon-reload && systemctl restart gnmi

The sidecar uses the shared sonic_py_common.sidecar_common library for file sync and nsenter operations. No CONFIG_DB reconciliation is included in this initial implementation.

How to verify it

Tested on local minikube cluster with SONiC DUT (vlab-01):

  1. Setup: minikube cluster on dev VM joined to SONiC DUT via kubeadm join
  2. Deployment: gnmi DaemonSet with sidecar container deployed to the cluster
  3. Verification:
    • Pod reaches 2/2 Running state with 0 restarts
    • Sidecar successfully syncs stub scripts to host via nsenter
    • Old docker-managed gnmi container is stopped and removed
    • Systemd gnmi.service is replaced with K8s-aware stub
    • gnmi service restarts correctly and delegates to K8s pod
$ kubectl get pods -l app=gnmi
NAME         READY   STATUS    RESTARTS   AGE
gnmi-h677q   2/2     Running   0          12s

Note: On SONiC 202505+ with systemd-sonic-generator (SSG), SSG modifies service files on daemon-reload which can cause sync loops. This is not an issue on production SONiC versions without this SSG behavior.

Which release branch to backport (provide reason below if selected)

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Tested branch (Please provide the tested image version)

  • SONiC.202505.150218175-aea08fb12a (vlab-01)

Description for the changelog

Add docker-gnmi-sidecar container to enable Kubernetes migration of gnmi service

Copilot AI review requested due to automatic review settings January 22, 2026 20:25
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@hdwhdw hdwhdw changed the title Add docker-gnmi-sidecar for K8s migration [KubeSonic] Add docker-gnmi-sidecar for K8s migration Jan 22, 2026
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new docker-gnmi-sidecar container that syncs GNMI systemd stubs and helper scripts from the container to the host via nsenter, enabling migration of GNMI from a systemd-managed Docker container to a Kubernetes-managed pod.

Changes:

  • Adds GNMI stub script and systemd unit wiring GNMI lifecycle through shared k8s_pod_control.sh instead of the native Docker-based service management.
  • Defines and wires the docker-gnmi-sidecar image into the build (including runtime options and file sync inputs) using the shared sidecar_common library for host file synchronization and post-copy actions.
  • Adds unit tests for the GNMI sidecar systemd_stub.py to validate sync behavior, environment-driven source selection, and CLI options (e.g., --once, --no-post-actions).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
rules/scripts.mk Registers gnmi.sh as a systemd helper script to be copied into the image, parallel to the existing telemetry systemd helper.
rules/docker-gnmi-sidecar.mk Defines the docker-gnmi-sidecar image, its dependencies, runtime options, and the files (container_checker, gnmi.sh, k8s_pod_control.sh) made available to its Docker build context.
dockers/docker-gnmi-sidecar/systemd_stub.py Implements the GNMI sidecar logic that selects between v1 and K8s-aware GNMI scripts based on IS_V1_ENABLED, syncs the GNMI unit/script/container_checker to the host, and runs appropriate systemctl/docker post-actions.
dockers/docker-gnmi-sidecar/systemd_scripts/gnmi.sh Provides the thin GNMI wrapper that delegates lifecycle operations to the shared k8s_pod_control.sh script.
dockers/docker-gnmi-sidecar/systemd_scripts/gnmi.service Defines the GNMI systemd unit that now drives GNMI via /usr/local/bin/gnmi.sh (ultimately K8s pod control) rather than a Docker-native service.
dockers/docker-gnmi-sidecar/supervisord.conf Configures supervisord inside the GNMI sidecar to start systemd_stub.py with dependent startup sequencing and to pass through the IS_V1_ENABLED environment knob.
dockers/docker-gnmi-sidecar/cli-plugin-tests/test_systemd_stub.py Adds tests covering sidecar sync fast-path and update behavior, post-copy host actions, failure reporting, CLI flags, and IS_V1_ENABLED-driven selection of gnmi_v1.sh vs the new GNMI stub.
dockers/docker-gnmi-sidecar/Dockerfile.j2 Defines a two-stage Docker build for docker-gnmi-sidecar, copying in the GNMI systemd stub/unit plus container_checker and k8s control script, and wiring systemd_stub.py under supervisor as the entrypoint-managed process.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI review requested due to automatic review settings January 27, 2026 22:50
@hdwhdw hdwhdw force-pushed the feature/gnmi-sidecar branch from 917e05c to cdcb081 Compare January 27, 2026 22:54
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

Add a new sidecar container that enables migration of the systemd-managed
gnmi container to Kubernetes. The sidecar syncs stub scripts to the host
via nsenter, replacing the native systemd service with a K8s-aware stub
that delegates container lifecycle to kubectl.

Files synced to host:
- gnmi.sh -> /usr/local/bin/gnmi.sh (stub calling k8s_pod_control.sh)
- gnmi.service -> /lib/systemd/system/gnmi.service
- container_checker -> /bin/container_checker
- k8s_pod_control.sh -> /usr/share/sonic/scripts/k8s_pod_control.sh

The sidecar uses the shared sidecar_common library for file sync and
nsenter operations. No CONFIG_DB reconciliation is included in this
initial implementation.

Signed-off-by: Dawei Huang <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

Add rules/docker-gnmi-sidecar.dep to define build dependencies
for the gnmi-sidecar container, matching the pattern used by
docker-telemetry-sidecar.

Signed-off-by: Dawei Huang <[email protected]>
@azure-pipelines
Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


# Pass the image_version to container
ENV IMAGE_VERSION=$image_version

Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IS_V1_ENABLED environment variable is set in the builder stage (line 17) but is missing from the final stage. This means the environment variable will not be available at runtime when supervisord tries to pass it to the systemd_stub process via the environment directive in supervisord.conf (line 38).

The restapi-sidecar Dockerfile correctly sets this environment variable in both stages (see dockers/docker-restapi-sidecar/Dockerfile.j2:39). The same pattern should be followed here to ensure IS_V1_ENABLED is available at runtime.

Suggested change
# K8s will override this
ENV IS_V1_ENABLED=false

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,94 @@
#!/usr/bin/env python3
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions "This follows the same sidecar pattern established by docker-telemetry-sidecar (PR #2021 design)" but the linked PR #2021 is about "Submodule update: Update sonic-swss. vxlan enhancements" which is unrelated to the sidecar pattern. The PR reference appears to be incorrect and should point to the actual PR that introduced the docker-telemetry-sidecar.

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +14
User=root
ExecStartPre=/usr/local/bin/gnmi.sh start
ExecStart=/usr/local/bin/gnmi.sh wait
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ExecStartPre, ExecStart, and ExecStop directives would benefit from inline comments explaining their behavior, similar to those found in dockers/docker-telemetry-sidecar/systemd_scripts/telemetry.service lines 13-15. These comments help clarify that start is non-blocking, wait is a long-lived loop observing pod status, and stop won't work after KubeSonic migration since the pod will be auto-deployed via Kubernetes.

Suggested change
User=root
ExecStartPre=/usr/local/bin/gnmi.sh start
ExecStart=/usr/local/bin/gnmi.sh wait
User=root
# Non-blocking: trigger GNMI container/pod startup and return immediately.
ExecStartPre=/usr/local/bin/gnmi.sh start
# Long-lived loop: wait for and monitor GNMI container/pod status.
ExecStart=/usr/local/bin/gnmi.sh wait
# Note: After KubeSonic migration, this stop action will not be effective,
# because the GNMI pod will be auto-deployed and managed by Kubernetes.

Copilot uses AI. Check for mistakes.
return 0 if ok else 1
while True:
time.sleep(args.interval)
ensure_sync()
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable ok is not used.

Suggested change
ensure_sync()
ok = ensure_sync()
if not ok:
logger.log_warning("Periodic ensure_sync() failed; continuing loop")

Copilot uses AI. Check for mistakes.
@hdwhdw hdwhdw force-pushed the feature/gnmi-sidecar branch from cdcb081 to 142a054 Compare January 27, 2026 22:59
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))

# Add sonic-py-common to path so we can import the real sidecar_common
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../../src/sonic-py-common")))
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path to add the docker-gnmi-sidecar directory to sys.path uses ".." (one level up from cli-plugin-tests/), which correctly reaches the directory containing systemd_stub.py. However, the path to sonic-py-common uses "../../../../src/sonic-py-common" (four levels up), which appears to be one level too many. From cli-plugin-tests/, going up three levels reaches the repository root, so the path should likely be "../../../src/sonic-py-common" instead. This is also inconsistent with the telemetry-sidecar test (which uses "../.." and "../../../../../" respectively), suggesting both might need correction. Verify that the tests run correctly with the current paths, and consider aligning both test files to use the correct relative paths.

Suggested change
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../../src/sonic-py-common")))
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../src/sonic-py-common")))

Copilot uses AI. Check for mistakes.
SONIC_INSTALL_DOCKER_DBG_IMAGES += $(DOCKER_GNMI_SIDECAR_DBG)


$(DOCKER_GNMI_SIDECAR)_DEPENDS += $(LIBSWSSCOMMON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may need #25225 as well otherwise internal pipeline will be failed later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

The sidecar uses sonic_py_common.sidecar_common library which needs
to be explicitly installed as a Python wheel. This mirrors the fix
in PR sonic-net#25225 for docker-telemetry-sidecar.

Signed-off-by: Dawei Huang <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

Copilot AI review requested due to automatic review settings February 5, 2026 23:22
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +32 to +34
$(DOCKER_GNMI_SIDECAR)_RUN_OPT += -t --privileged --pid=host
$(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /lib/systemd/system:/lib/systemd/system:rw
$(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /etc/audit:/etc/audit:rw
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This container is configured to run --privileged with --pid=host and has read-write mounts of /lib/systemd/system and /etc/audit, which effectively gives any code execution inside the sidecar full control over the host’s init system and audit configuration. An attacker who compromises the sidecar (or another container in the same pod, depending on the deployment) can modify or replace systemd units or audit rules to gain persistent root access on the host and hide their activity. Consider removing --privileged and host PID sharing, and restricting host volume mounts to the minimal required paths and permissions (e.g., specific files via more targeted bind-mounts or hostPath settings) while enforcing least-privilege within the container.

Suggested change
$(DOCKER_GNMI_SIDECAR)_RUN_OPT += -t --privileged --pid=host
$(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /lib/systemd/system:/lib/systemd/system:rw
$(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /etc/audit:/etc/audit:rw
$(DOCKER_GNMI_SIDECAR)_RUN_OPT += -t
$(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /lib/systemd/system:/lib/systemd/system:ro
$(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /etc/audit:/etc/audit:ro

Copilot uses AI. Check for mistakes.
$(LIBYANG_PY3)

$(DOCKER_GNMI_SIDECAR)_CONTAINER_NAME = gnmi-sidecar
$(DOCKER_GNMI_SIDECAR)_RUN_OPT += -t --privileged --pid=host
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're adding more privileged containers? I thought based on this we wouldn't do that anymore? https://github.com/sonic-net/SONiC/blob/master/doc/Container%20Hardening/SONiC_container_hardening_HLD.md#docker-privileges

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is copy from restapi-sidecar which we should fix as well. Updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With specific capacities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Replace the --privileged flag with minimal Linux capabilities for
container hardening, following the principle of least privilege.

Capabilities added:
- CAP_SYS_ADMIN: For system administration operations
- CAP_SYS_PTRACE: For nsenter to access /proc/[pid]/ns/*
- CAP_DAC_OVERRIDE: To bypass file permission checks

Security options added:
- apparmor=unconfined
- seccomp=unconfined

These are required for nsenter to access host namespaces and sync
files to the host filesystem.

Signed-off-by: Dawei Huang <[email protected]>
The path to sonic-py-common had 4 parent levels (../../../../) but
should only have 3 (../../../). The test file is at:
  dockers/docker-gnmi-sidecar/cli-plugin-tests/test_systemd_stub.py

Traversing 3 levels up reaches sonic-buildimage repo root, from where
src/sonic-py-common is accessible.

Signed-off-by: Dawei Huang <[email protected]>
Copilot AI review requested due to automatic review settings February 10, 2026 20:45
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@losha228 losha228 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yxieca yxieca merged commit 3ca8821 into sonic-net:master Feb 17, 2026
31 of 32 checks passed

[program:rsyslogd]
command=/usr/sbin/rsyslogd -n -iNONE
priority=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need add rsyslog conf files, e.g. https://github.com/sonic-net/sonic-buildimage/tree/master/dockers/docker-restapi-sidecar/etc, otherwise, syslog will be missing

$(DOCKER_GNMI_SIDECAR)_RUN_OPT += --security-opt apparmor=unconfined
$(DOCKER_GNMI_SIDECAR)_RUN_OPT += --security-opt seccomp=unconfined
$(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /lib/systemd/system:/lib/systemd/system:rw
$(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /etc/audit:/etc/audit:rw
Copy link
Contributor

@maipbui maipbui Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this mount -v /etc/audit:/etc/audit:rw really needed?

FengPan-Frank pushed a commit to FengPan-Frank/sonic-buildimage that referenced this pull request Mar 6, 2026
his PR introduces a new docker-gnmi-sidecar container that syncs GNMI systemd stubs and helper scripts from the container to the host via nsenter, enabling migration of GNMI from a systemd-managed Docker container to a Kubernetes-managed pod.

Changes:

Adds GNMI stub script and systemd unit wiring GNMI lifecycle through shared k8s_pod_control.sh instead of the native Docker-based service management.
Defines and wires the docker-gnmi-sidecar image into the build (including runtime options and file sync inputs) using the shared sidecar_common library for host file synchronization and post-copy actions.
Adds unit tests for the GNMI sidecar systemd_stub.py to validate sync behavior, environment-driven source selection, and CLI options (e.g., --once, --no-post-actions).

Signed-off-by: Dawei Huang <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Feng Pan <[email protected]>
dprital pushed a commit that referenced this pull request Mar 19, 2026
his PR introduces a new docker-gnmi-sidecar container that syncs GNMI systemd stubs and helper scripts from the container to the host via nsenter, enabling migration of GNMI from a systemd-managed Docker container to a Kubernetes-managed pod.

Changes:

Adds GNMI stub script and systemd unit wiring GNMI lifecycle through shared k8s_pod_control.sh instead of the native Docker-based service management.
Defines and wires the docker-gnmi-sidecar image into the build (including runtime options and file sync inputs) using the shared sidecar_common library for host file synchronization and post-copy actions.
Adds unit tests for the GNMI sidecar systemd_stub.py to validate sync behavior, environment-driven source selection, and CLI options (e.g., --once, --no-post-actions).

Signed-off-by: Dawei Huang <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: dprital <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants