[KubeSonic] Add docker-gnmi-sidecar for K8s migration#25163
[KubeSonic] Add docker-gnmi-sidecar for K8s migration#25163yxieca merged 8 commits intosonic-net:masterfrom
Conversation
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new docker-gnmi-sidecar container that syncs GNMI systemd stubs and helper scripts from the container to the host via nsenter, enabling migration of GNMI from a systemd-managed Docker container to a Kubernetes-managed pod.
Changes:
- Adds GNMI stub script and systemd unit wiring GNMI lifecycle through shared
k8s_pod_control.shinstead of the native Docker-based service management. - Defines and wires the
docker-gnmi-sidecarimage into the build (including runtime options and file sync inputs) using the sharedsidecar_commonlibrary for host file synchronization and post-copy actions. - Adds unit tests for the GNMI sidecar
systemd_stub.pyto validate sync behavior, environment-driven source selection, and CLI options (e.g.,--once,--no-post-actions).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| rules/scripts.mk | Registers gnmi.sh as a systemd helper script to be copied into the image, parallel to the existing telemetry systemd helper. |
| rules/docker-gnmi-sidecar.mk | Defines the docker-gnmi-sidecar image, its dependencies, runtime options, and the files (container_checker, gnmi.sh, k8s_pod_control.sh) made available to its Docker build context. |
| dockers/docker-gnmi-sidecar/systemd_stub.py | Implements the GNMI sidecar logic that selects between v1 and K8s-aware GNMI scripts based on IS_V1_ENABLED, syncs the GNMI unit/script/container_checker to the host, and runs appropriate systemctl/docker post-actions. |
| dockers/docker-gnmi-sidecar/systemd_scripts/gnmi.sh | Provides the thin GNMI wrapper that delegates lifecycle operations to the shared k8s_pod_control.sh script. |
| dockers/docker-gnmi-sidecar/systemd_scripts/gnmi.service | Defines the GNMI systemd unit that now drives GNMI via /usr/local/bin/gnmi.sh (ultimately K8s pod control) rather than a Docker-native service. |
| dockers/docker-gnmi-sidecar/supervisord.conf | Configures supervisord inside the GNMI sidecar to start systemd_stub.py with dependent startup sequencing and to pass through the IS_V1_ENABLED environment knob. |
| dockers/docker-gnmi-sidecar/cli-plugin-tests/test_systemd_stub.py | Adds tests covering sidecar sync fast-path and update behavior, post-copy host actions, failure reporting, CLI flags, and IS_V1_ENABLED-driven selection of gnmi_v1.sh vs the new GNMI stub. |
| dockers/docker-gnmi-sidecar/Dockerfile.j2 | Defines a two-stage Docker build for docker-gnmi-sidecar, copying in the GNMI systemd stub/unit plus container_checker and k8s control script, and wiring systemd_stub.py under supervisor as the entrypoint-managed process. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
917e05c to
cdcb081
Compare
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
Add a new sidecar container that enables migration of the systemd-managed gnmi container to Kubernetes. The sidecar syncs stub scripts to the host via nsenter, replacing the native systemd service with a K8s-aware stub that delegates container lifecycle to kubectl. Files synced to host: - gnmi.sh -> /usr/local/bin/gnmi.sh (stub calling k8s_pod_control.sh) - gnmi.service -> /lib/systemd/system/gnmi.service - container_checker -> /bin/container_checker - k8s_pod_control.sh -> /usr/share/sonic/scripts/k8s_pod_control.sh The sidecar uses the shared sidecar_common library for file sync and nsenter operations. No CONFIG_DB reconciliation is included in this initial implementation. Signed-off-by: Dawei Huang <[email protected]>
|
/azp run Azure.sonic-buildimage |
Add rules/docker-gnmi-sidecar.dep to define build dependencies for the gnmi-sidecar container, matching the pattern used by docker-telemetry-sidecar. Signed-off-by: Dawei Huang <[email protected]>
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
Co-authored-by: Copilot <[email protected]> Signed-off-by: Dawei Huang <[email protected]>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 36 out of 37 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| # Pass the image_version to container | ||
| ENV IMAGE_VERSION=$image_version | ||
|
|
There was a problem hiding this comment.
The IS_V1_ENABLED environment variable is set in the builder stage (line 17) but is missing from the final stage. This means the environment variable will not be available at runtime when supervisord tries to pass it to the systemd_stub process via the environment directive in supervisord.conf (line 38).
The restapi-sidecar Dockerfile correctly sets this environment variable in both stages (see dockers/docker-restapi-sidecar/Dockerfile.j2:39). The same pattern should be followed here to ensure IS_V1_ENABLED is available at runtime.
| # K8s will override this | |
| ENV IS_V1_ENABLED=false |
| @@ -0,0 +1,94 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
The PR description mentions "This follows the same sidecar pattern established by docker-telemetry-sidecar (PR #2021 design)" but the linked PR #2021 is about "Submodule update: Update sonic-swss. vxlan enhancements" which is unrelated to the sidecar pattern. The PR reference appears to be incorrect and should point to the actual PR that introduced the docker-telemetry-sidecar.
| User=root | ||
| ExecStartPre=/usr/local/bin/gnmi.sh start | ||
| ExecStart=/usr/local/bin/gnmi.sh wait |
There was a problem hiding this comment.
The ExecStartPre, ExecStart, and ExecStop directives would benefit from inline comments explaining their behavior, similar to those found in dockers/docker-telemetry-sidecar/systemd_scripts/telemetry.service lines 13-15. These comments help clarify that start is non-blocking, wait is a long-lived loop observing pod status, and stop won't work after KubeSonic migration since the pod will be auto-deployed via Kubernetes.
| User=root | |
| ExecStartPre=/usr/local/bin/gnmi.sh start | |
| ExecStart=/usr/local/bin/gnmi.sh wait | |
| User=root | |
| # Non-blocking: trigger GNMI container/pod startup and return immediately. | |
| ExecStartPre=/usr/local/bin/gnmi.sh start | |
| # Long-lived loop: wait for and monitor GNMI container/pod status. | |
| ExecStart=/usr/local/bin/gnmi.sh wait | |
| # Note: After KubeSonic migration, this stop action will not be effective, | |
| # because the GNMI pod will be auto-deployed and managed by Kubernetes. |
| return 0 if ok else 1 | ||
| while True: | ||
| time.sleep(args.interval) | ||
| ensure_sync() |
There was a problem hiding this comment.
Variable ok is not used.
| ensure_sync() | |
| ok = ensure_sync() | |
| if not ok: | |
| logger.log_warning("Periodic ensure_sync() failed; continuing loop") |
cdcb081 to
142a054
Compare
|
/azp run Azure.sonic-buildimage |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))) | ||
|
|
||
| # Add sonic-py-common to path so we can import the real sidecar_common | ||
| sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../../src/sonic-py-common"))) |
There was a problem hiding this comment.
The path to add the docker-gnmi-sidecar directory to sys.path uses ".." (one level up from cli-plugin-tests/), which correctly reaches the directory containing systemd_stub.py. However, the path to sonic-py-common uses "../../../../src/sonic-py-common" (four levels up), which appears to be one level too many. From cli-plugin-tests/, going up three levels reaches the repository root, so the path should likely be "../../../src/sonic-py-common" instead. This is also inconsistent with the telemetry-sidecar test (which uses "../.." and "../../../../../" respectively), suggesting both might need correction. Verify that the tests run correctly with the current paths, and consider aligning both test files to use the correct relative paths.
| sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../../src/sonic-py-common"))) | |
| sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "../../../src/sonic-py-common"))) |
| SONIC_INSTALL_DOCKER_DBG_IMAGES += $(DOCKER_GNMI_SIDECAR_DBG) | ||
|
|
||
|
|
||
| $(DOCKER_GNMI_SIDECAR)_DEPENDS += $(LIBSWSSCOMMON) |
There was a problem hiding this comment.
you may need #25225 as well otherwise internal pipeline will be failed later.
The sidecar uses sonic_py_common.sidecar_common library which needs to be explicitly installed as a Python wheel. This mirrors the fix in PR sonic-net#25225 for docker-telemetry-sidecar. Signed-off-by: Dawei Huang <[email protected]>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
Signed-off-by: Dawei Huang <[email protected]>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
rules/docker-gnmi-sidecar.mk
Outdated
| $(DOCKER_GNMI_SIDECAR)_RUN_OPT += -t --privileged --pid=host | ||
| $(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /lib/systemd/system:/lib/systemd/system:rw | ||
| $(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /etc/audit:/etc/audit:rw |
There was a problem hiding this comment.
This container is configured to run --privileged with --pid=host and has read-write mounts of /lib/systemd/system and /etc/audit, which effectively gives any code execution inside the sidecar full control over the host’s init system and audit configuration. An attacker who compromises the sidecar (or another container in the same pod, depending on the deployment) can modify or replace systemd units or audit rules to gain persistent root access on the host and hide their activity. Consider removing --privileged and host PID sharing, and restricting host volume mounts to the minimal required paths and permissions (e.g., specific files via more targeted bind-mounts or hostPath settings) while enforcing least-privilege within the container.
| $(DOCKER_GNMI_SIDECAR)_RUN_OPT += -t --privileged --pid=host | |
| $(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /lib/systemd/system:/lib/systemd/system:rw | |
| $(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /etc/audit:/etc/audit:rw | |
| $(DOCKER_GNMI_SIDECAR)_RUN_OPT += -t | |
| $(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /lib/systemd/system:/lib/systemd/system:ro | |
| $(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /etc/audit:/etc/audit:ro |
rules/docker-gnmi-sidecar.mk
Outdated
| $(LIBYANG_PY3) | ||
|
|
||
| $(DOCKER_GNMI_SIDECAR)_CONTAINER_NAME = gnmi-sidecar | ||
| $(DOCKER_GNMI_SIDECAR)_RUN_OPT += -t --privileged --pid=host |
There was a problem hiding this comment.
We're adding more privileged containers? I thought based on this we wouldn't do that anymore? https://github.com/sonic-net/SONiC/blob/master/doc/Container%20Hardening/SONiC_container_hardening_HLD.md#docker-privileges
There was a problem hiding this comment.
Yeah this is copy from restapi-sidecar which we should fix as well. Updated.
There was a problem hiding this comment.
With specific capacities.
Replace the --privileged flag with minimal Linux capabilities for container hardening, following the principle of least privilege. Capabilities added: - CAP_SYS_ADMIN: For system administration operations - CAP_SYS_PTRACE: For nsenter to access /proc/[pid]/ns/* - CAP_DAC_OVERRIDE: To bypass file permission checks Security options added: - apparmor=unconfined - seccomp=unconfined These are required for nsenter to access host namespaces and sync files to the host filesystem. Signed-off-by: Dawei Huang <[email protected]>
The path to sonic-py-common had 4 parent levels (../../../../) but should only have 3 (../../../). The test file is at: dockers/docker-gnmi-sidecar/cli-plugin-tests/test_systemd_stub.py Traversing 3 levels up reaches sonic-buildimage repo root, from where src/sonic-py-common is accessible. Signed-off-by: Dawei Huang <[email protected]>
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| [program:rsyslogd] | ||
| command=/usr/sbin/rsyslogd -n -iNONE | ||
| priority=1 |
There was a problem hiding this comment.
I think need add rsyslog conf files, e.g. https://github.com/sonic-net/sonic-buildimage/tree/master/dockers/docker-restapi-sidecar/etc, otherwise, syslog will be missing
| $(DOCKER_GNMI_SIDECAR)_RUN_OPT += --security-opt apparmor=unconfined | ||
| $(DOCKER_GNMI_SIDECAR)_RUN_OPT += --security-opt seccomp=unconfined | ||
| $(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /lib/systemd/system:/lib/systemd/system:rw | ||
| $(DOCKER_GNMI_SIDECAR)_RUN_OPT += -v /etc/audit:/etc/audit:rw |
There was a problem hiding this comment.
is this mount -v /etc/audit:/etc/audit:rw really needed?
his PR introduces a new docker-gnmi-sidecar container that syncs GNMI systemd stubs and helper scripts from the container to the host via nsenter, enabling migration of GNMI from a systemd-managed Docker container to a Kubernetes-managed pod. Changes: Adds GNMI stub script and systemd unit wiring GNMI lifecycle through shared k8s_pod_control.sh instead of the native Docker-based service management. Defines and wires the docker-gnmi-sidecar image into the build (including runtime options and file sync inputs) using the shared sidecar_common library for host file synchronization and post-copy actions. Adds unit tests for the GNMI sidecar systemd_stub.py to validate sync behavior, environment-driven source selection, and CLI options (e.g., --once, --no-post-actions). Signed-off-by: Dawei Huang <[email protected]> Co-authored-by: Copilot <[email protected]> Signed-off-by: Feng Pan <[email protected]>
his PR introduces a new docker-gnmi-sidecar container that syncs GNMI systemd stubs and helper scripts from the container to the host via nsenter, enabling migration of GNMI from a systemd-managed Docker container to a Kubernetes-managed pod. Changes: Adds GNMI stub script and systemd unit wiring GNMI lifecycle through shared k8s_pod_control.sh instead of the native Docker-based service management. Defines and wires the docker-gnmi-sidecar image into the build (including runtime options and file sync inputs) using the shared sidecar_common library for host file synchronization and post-copy actions. Adds unit tests for the GNMI sidecar systemd_stub.py to validate sync behavior, environment-driven source selection, and CLI options (e.g., --once, --no-post-actions). Signed-off-by: Dawei Huang <[email protected]> Co-authored-by: Copilot <[email protected]> Signed-off-by: dprital <[email protected]>
Why I did it
Enable migration of the systemd-managed gnmi container to Kubernetes as part of the KubeSonic initiative. This follows the same sidecar pattern established by docker-telemetry-sidecar (PR #2021 design).
The sidecar container runs alongside the K8s-managed gnmi pod and:
nsenterkubectlWork item tracking
How I did it
Created a new
docker-gnmi-sidecarcontainer with:Dockerfile.j2supervisord.confsystemd_stub.pysidecar_commonlibrarysystemd_scripts/gnmi.shk8s_pod_control.shsystemd_scripts/gnmi.serviceFiles synced to host:
gnmi.sh→/usr/local/bin/gnmi.shgnmi.service→/lib/systemd/system/gnmi.servicecontainer_checker→/bin/container_checkerk8s_pod_control.sh→/usr/share/sonic/scripts/k8s_pod_control.shPost-copy actions when
gnmi.shis updated:The sidecar uses the shared
sonic_py_common.sidecar_commonlibrary for file sync and nsenter operations. No CONFIG_DB reconciliation is included in this initial implementation.How to verify it
Tested on local minikube cluster with SONiC DUT (vlab-01):
kubeadm join2/2 Runningstate with 0 restartsNote: On SONiC 202505+ with systemd-sonic-generator (SSG), SSG modifies service files on daemon-reload which can cause sync loops. This is not an issue on production SONiC versions without this SSG behavior.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Add docker-gnmi-sidecar container to enable Kubernetes migration of gnmi service