Add design for migrating systemd-managed docker containers to Kubernetes with resource control by FengPan-Frank · Pull Request #2021 · sonic-net/SONiC

FengPan-Frank · 2025-06-24T11:15:45Z

Add design for migrating systemd-managed docker containers to Kubernetes with resource control

Description	Repo	PR link
sidecar container sample (telemetry)	sonic-buildimage	sonic-net/sonic-buildimage#23936
docker container change sample (telemetry)	sonic-buildimage	sonic-net/sonic-buildimage#23756
watchdog container sample (telemetry)	sonic-buildimage	sonic-net/sonic-buildimage#23724

mssonicbld · 2025-06-24T11:15:53Z

/azp run

azure-pipelines · 2025-06-24T11:16:00Z

No pipelines are associated with this pull request.

mssonicbld · 2025-06-24T13:07:42Z

/azp run

azure-pipelines · 2025-06-24T13:07:50Z

No pipelines are associated with this pull request.

mssonicbld · 2025-06-25T13:15:54Z

/azp run

azure-pipelines · 2025-06-25T13:16:01Z

No pipelines are associated with this pull request.

mssonicbld · 2025-07-01T14:08:19Z

/azp run

azure-pipelines · 2025-07-01T14:08:24Z

No pipelines are associated with this pull request.

doc/bmp/k8s_migration_design.md

mssonicbld · 2025-07-02T03:47:43Z

/azp run

azure-pipelines · 2025-07-02T03:47:49Z

No pipelines are associated with this pull request.

…tes with resource control

mssonicbld · 2025-07-03T10:12:05Z

/azp run

azure-pipelines · 2025-07-03T10:12:13Z

No pipelines are associated with this pull request.

…tes with resource control

…bmpk8s

mssonicbld · 2025-07-03T13:19:37Z

/azp run

azure-pipelines · 2025-07-03T13:19:43Z

No pipelines are associated with this pull request.

mssonicbld · 2025-07-03T13:52:20Z

/azp run

azure-pipelines · 2025-07-03T13:52:27Z

No pipelines are associated with this pull request.

mssonicbld · 2025-07-27T10:43:56Z

/azp run

azure-pipelines · 2025-07-27T10:44:03Z

No pipelines are associated with this pull request.

doc/kubernetes/k8s_migration_design.md

make1980 · 2025-07-26T17:45:53Z

doc/kubernetes/k8s_migration_design.md

+        let response = "HTTP/1.1 200 OK\r\nContent-Type: application/json\r\nContent-Length: 29\r\n\r\n{\"message\":\"telemetry disabled\"}";
+        stream.write_all(response.as_bytes()).ok();
+    } else {
+        // normal health check


we should describe what's monitored by the watchdog - maybe just a simple physical path probe to telemetry will be a good starting point.

added real detection code for gnmi service listen 50051

doc/kubernetes/k8s_migration_design.md

make1980 · 2025-07-26T17:57:39Z

doc/kubernetes/k8s_migration_design.md

+kill_container() {
+    POD_NAME=$(get_pod_name)
+    if [[ -n "$POD_NAME" ]]; then
+        kubectl exec "$POD_NAME" -n "$NAMESPACE" -- pkill -f telemetry


have you tried kubectl exec? this may not work in our setup - we don't have a kubemaster to device channel today. so maybe you should use docker command to find the container with the pod name in it and restart it through docker instead

this is again some complexity because we do not put telemetry into its own daemonset/pod.

Can we add a label like "feature: telemetry" which has the same name as FEATURE table?

make1980 · 2025-07-26T17:58:58Z

doc/kubernetes/k8s_migration_design.md

+  - Patch systemd service files.
+
+
+patch_systemd.sh


i'd prefer to make the sidecar container written in golang so that we don't need a base image for it.

doc/kubernetes/k8s_migration_design.md

make1980 · 2025-07-26T18:04:21Z

doc/kubernetes/k8s_migration_design.md

+The desired state script will based on top of tree from specific code branch.
+
+```mermaid
+sequenceDiagram


one thing we should probably clarify in this diagram is that we don't intend to change postupgrade script to use a desired state file(or the same desired state file that we use in the sidecar). if any postupgrade script is doing a diff based update it will still be that way but sidecar will overwrite it.

make1980 · 2025-07-26T18:21:34Z

doc/kubernetes/k8s_migration_design.md

+    telemetry-watchdog->>k8s: skip detect and return OK   
+```
+
+### 5.4 After KubeSonic rollout, rollback the container. (v2+ -> v2)


featured seems to use "systemctl show telemetry --property UnitFileState" to get the status of a service - in our current design how do we make sure it will reflect whether telemetry is running or not?

this doesn't seem to work as expected today so we might be fine. the logic is: 1. check show --property result, if it's enabled don't do anything and set state table as enabled. 2. try enable and start if no error, set state table as enabled.

we'll have no change in current featured monitoring workflow.

make1980 · 2025-07-27T23:09:48Z

doc/kubernetes/k8s_migration_design.md

+
+    syslog.syslog(syslog.LOG_ERR, ERROR_CGROUP_MEMORY_USAGE_NOT_FOUND.format(docker_memory_usage_file_path, container_id))
+    sys.exit(INTERNAL_ERROR)
+```


we should probably just say after k8s we simplify the restart logic to be OOM based only - I'm not sure why we can tolerate memory usage exceeding the limit for several times as what's implemented right now.

I'm not sure previous intention for retry, updated this part as well.

doc/kubernetes/k8s_migration_design.md

make1980 · 2025-07-27T23:13:40Z

doc/kubernetes/k8s_migration_design.md

+#### FEATURE Table Snapshot
+```json
+"telemetry": {
+  "auto_restart": "enabled",


do we still support auto_restart with systemd stub? maybe delayed is still supported through the stub? not sure about that part...

same for check_up_status. we don't use set_owner anymore with this design, right?

losha228 · 2025-07-29T07:34:31Z

doc/kubernetes/k8s_migration_design.md

+    sleep 1000
+  fi
+done
+exec /usr/local/bin/supervisord


what if the feature is enabled and is running by k8s pod, then someone disables the feature, I suppose the systemd service may be stopped, but what's the behavior of the container ? will it exit ? We should clear the behavior

systemd will kill and restart container, after restart container will read from FEATURE table and know it's disabled then enters idle status.

losha228 · 2025-07-29T07:39:48Z

doc/kubernetes/k8s_migration_design.md

+#### v1+ Behavior with Kubernetes DaemonSet
+- The container is deployed via Kubernetes DaemonSet.
+- The systemd service is retained as a **stub**, to avoid breaking automation or tools that query it.
+- For start/stop/restart, it will just kill container simply so that kubernetes will relaunch it automatically. Here is some limitation: since kubernetes takes over the container thus systemd stop will not STOP container really, unless we tint the node but that requires each container should use dedicated daemon set which is not preferrable as well.


for start action, we may return immediately, because kubelet will always start it if it's not running.

For guarantee the correctness we may simply kill/restart container, so that latest FEATURE table could be read.

losha228 · 2025-07-29T07:47:18Z

doc/kubernetes/k8s_migration_design.md

+- The container is deployed via Kubernetes DaemonSet.
+- The systemd service is retained as a **stub**, to avoid breaking automation or tools that query it.
+- For start/stop/restart, it will just kill container simply so that kubernetes will relaunch it automatically. Here is some limitation: since kubernetes takes over the container thus systemd stop will not STOP container really, unless we tint the node but that requires each container should use dedicated daemon set which is not preferrable as well.
+- For status, it will return runtime state via kubectl.


how do you get the status for the container by kubectl ? If you want to do it, you may need to get the corresponding pod and extract the container status from the pod, there is not a clear mapping between pod and the feature container, so we may return the status based on feature status.

Can we add a label like "feature: telemetry" which has the same name as FEATURE table now?

we can use label to map feature and container status.

losha228 · 2025-07-29T08:10:11Z

doc/kubernetes/k8s_migration_design.md

+    exit 1
+}
+
+get_pod_name() {


it assumes there is only one pod on the node, but it may not be true.

we can update during implementation.

losha228 · 2025-07-29T08:16:35Z

doc/kubernetes/k8s_migration_design.md

+kill_container() {
+    POD_NAME=$(get_pod_name)
+    if [[ -n "$POD_NAME" ]]; then
+        kubectl exec "$POD_NAME" -n "$NAMESPACE" -- pkill -f telemetry


in the pod, the service is still managed by systemd inside the container, so I think you may call supervisorctl to restart it

please note, we can deploy multiple containers in a pod, we usually use the feature name as the container name, you will need to consider to use '-c ' in kubectl exec

losha228 · 2025-07-29T08:42:28Z

doc/kubernetes/k8s_migration_design.md

+    end
+```
+
+### 5.4 After KubeSonic rollout, rollback the container to imaged based version. (v2 -> v0)


what's the scenario for v2 -> v0 ? In k8s rollout, if v2 is failed, it will rollout to v1 instead of v0, right ?

v1 only has systemd stub files, here I mean rollback to original SONiC image based version.

…bmpk8s

mssonicbld · 2025-07-29T12:21:34Z

/azp run

azure-pipelines · 2025-07-29T12:21:41Z

No pipelines are associated with this pull request.

…bmpk8s

mssonicbld · 2025-07-29T12:26:40Z

/azp run

azure-pipelines · 2025-07-29T12:26:46Z

No pipelines are associated with this pull request.

ganglyu · 2025-07-31T06:28:21Z

doc/kubernetes/k8s_migration_design.md

+#### v0 Behavior
+- Container is fully managed by `systemd` (e.g., `telemetry.service`).
+- FEATURE table controls startup (`state: Enabled/Disabled`).
+- Actual container starts or stops via `systemctl` by featured.service


featured has used other systemctl commands like mask and unmask, how to support them?

@FengPan-Frank - let's take a look at the featured implementation. It seems that telemetry systemd service doesn't even support mask/unmask/enable/disable stuff but for the containers that do support we should probably just treat them all as restart.

Verified on this, mask/unmask will flap symlink to /dev/null thus we should be fine with this command

ganglyu · 2025-07-31T06:54:41Z

doc/kubernetes/k8s_migration_design.md

+As section "Enforce CPU and memory resource constraints natively" mentioned, this monit functionality could be covered by Kubernetes, thus for monit we need to get rid of if from the KubeSonic rollouted container. However, during transition period if there's any case decalres that `monit` must be retained temporarily, we can also rewrite /usr/bin/memory_check to use Kubernetes data (e.g., via `kubectl top`) instead of reading Docker or CGroup files like above.
+
+---
+


How to support container_checker?

updated container_check into doc as well.

ganglyu · 2025-07-31T06:57:03Z

doc/kubernetes/k8s_migration_design.md

+#### v1+ Behavior with Kubernetes DaemonSet
+- The container is deployed via Kubernetes DaemonSet.
+- The systemd service is retained as a **stub**, to avoid breaking automation or tools that query it.
+- For start/stop/restart, it will just kill container simply so that kubernetes will relaunch it automatically. Here is some limitation: since kubernetes takes over the container thus systemd stop will not STOP container really, unless we taint the node but that requires each container should use dedicated daemon set which is not preferrable as well.


We don't really support the container, if the container has memory issue, this might not be enough.

do you mean OOM issue? OOM should be handled via below section 6

mssonicbld · 2025-08-01T11:39:44Z

/azp run

azure-pipelines · 2025-08-01T11:39:50Z

No pipelines are associated with this pull request.

…bmpk8s

mssonicbld · 2025-08-01T11:42:42Z

/azp run

azure-pipelines · 2025-08-01T11:42:48Z

No pipelines are associated with this pull request.

…bmpk8s

mssonicbld · 2025-08-07T13:27:05Z

/azp run

azure-pipelines · 2025-08-07T13:27:11Z

No pipelines are associated with this pull request.

FengPan-Frank force-pushed the bmpk8s branch from e3db416 to e95ccf8 Compare June 24, 2025 13:07

FengPan-Frank changed the title ~~Add design for migrating systemd-managed docker containers to Kuberne…~~ Add design for migrating systemd-managed docker containers to Kubernetes with resource control Jun 24, 2025

FengPan-Frank force-pushed the bmpk8s branch from e95ccf8 to 35489a0 Compare June 25, 2025 13:15

FengPan-Frank force-pushed the bmpk8s branch from 35489a0 to 5a97334 Compare July 1, 2025 14:08

make1980 requested changes Jul 1, 2025

View reviewed changes

doc/bmp/k8s_migration_design.md Outdated Show resolved Hide resolved

doc/bmp/k8s_migration_design.md Outdated Show resolved Hide resolved

doc/bmp/k8s_migration_design.md Outdated Show resolved Hide resolved

doc/bmp/k8s_migration_design.md Outdated Show resolved Hide resolved

make1980 self-requested a review July 1, 2025 18:17

make1980 requested changes Jul 1, 2025

View reviewed changes

doc/bmp/k8s_migration_design.md Outdated Show resolved Hide resolved

doc/bmp/k8s_migration_design.md Outdated Show resolved Hide resolved

FengPan-Frank force-pushed the bmpk8s branch from 5a97334 to e7e1baf Compare July 2, 2025 03:47

Add design for migrating systemd-managed docker containers to Kuberne…

4371197

…tes with resource control

FengPan-Frank force-pushed the bmpk8s branch from e7e1baf to 4371197 Compare July 3, 2025 10:11

FengPan-Frank added 2 commits July 3, 2025 13:16

Add design for migrating systemd-managed docker containers to Kuberne…

a77fed0

…tes with resource control

Merge branch 'bmpk8s' of https://github.com/FengPan-Frank/SONiC into …

e82f9a5

…bmpk8s

Address comments

2f6d18b

FengPan-Frank requested a review from make1980 July 3, 2025 23:11

move to kubernetes repo

2c19e45

FengPan-Frank force-pushed the bmpk8s branch from 08ba87c to 2c19e45 Compare July 27, 2025 10:43

make1980 requested changes Jul 27, 2025

View reviewed changes

make1980 previously approved these changes Jul 27, 2025

View reviewed changes

losha228 reviewed Jul 29, 2025

View reviewed changes

FengPan-Frank added 2 commits July 29, 2025 12:20

move to kubernetes repo

6f6effd

Merge branch 'bmpk8s' of https://github.com/FengPan-Frank/SONiC into …

c834494

…bmpk8s

FengPan-Frank dismissed make1980’s stale review via c834494 July 29, 2025 12:21

FengPan-Frank added 2 commits July 29, 2025 12:26

Merge branch 'bmpk8s' of https://github.com/FengPan-Frank/SONiC into …

e30ed30

…bmpk8s

Merge branch 'bmpk8s' of https://github.com/FengPan-Frank/SONiC into …

bf02e6b

…bmpk8s

ganglyu reviewed Jul 31, 2025

View reviewed changes

add container_checker

503d331

FengPan-Frank added 2 commits August 1, 2025 11:41

add container_checker

5dee4ce

Merge branch 'bmpk8s' of https://github.com/FengPan-Frank/SONiC into …

6af3e6d

…bmpk8s

FengPan-Frank added 2 commits August 7, 2025 13:26

Merge branch 'bmpk8s' of https://github.com/FengPan-Frank/SONiC into …

4aa8120

…bmpk8s

Merge branch 'bmpk8s' of https://github.com/FengPan-Frank/SONiC into …

ce3ffda

…bmpk8s

		As section "Enforce CPU and memory resource constraints natively" mentioned, this monit functionality could be covered by Kubernetes, thus for monit we need to get rid of if from the KubeSonic rollouted container. However, during transition period if there's any case decalres that `monit` must be retained temporarily, we can also rewrite /usr/bin/memory_check to use Kubernetes data (e.g., via `kubectl top`) instead of reading Docker or CGroup files like above.

		---

Conversation

FengPan-Frank commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mssonicbld commented Jun 24, 2025

Uh oh!

azure-pipelines bot commented Jun 24, 2025

Uh oh!

mssonicbld commented Jun 24, 2025

Uh oh!

azure-pipelines bot commented Jun 24, 2025

Uh oh!

mssonicbld commented Jun 25, 2025

Uh oh!

azure-pipelines bot commented Jun 25, 2025

Uh oh!

mssonicbld commented Jul 1, 2025

Uh oh!

azure-pipelines bot commented Jul 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mssonicbld commented Jul 2, 2025

Uh oh!

azure-pipelines bot commented Jul 2, 2025

Uh oh!

mssonicbld commented Jul 3, 2025

Uh oh!

azure-pipelines bot commented Jul 3, 2025

Uh oh!

mssonicbld commented Jul 3, 2025

Uh oh!

azure-pipelines bot commented Jul 3, 2025

Uh oh!

mssonicbld commented Jul 3, 2025

Uh oh!

azure-pipelines bot commented Jul 3, 2025

Uh oh!

mssonicbld commented Jul 27, 2025

Uh oh!

azure-pipelines bot commented Jul 27, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FengPan-Frank commented Jun 24, 2025 •

edited

Loading