Add design for migrating systemd-managed docker containers to Kubernetes with resource control#2021
Add design for migrating systemd-managed docker containers to Kubernetes with resource control#2021FengPan-Frank wants to merge 36 commits intosonic-net:masterfrom
Conversation
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
…tes with resource control
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
| let response = "HTTP/1.1 200 OK\r\nContent-Type: application/json\r\nContent-Length: 29\r\n\r\n{\"message\":\"telemetry disabled\"}"; | ||
| stream.write_all(response.as_bytes()).ok(); | ||
| } else { | ||
| // normal health check |
There was a problem hiding this comment.
we should describe what's monitored by the watchdog - maybe just a simple physical path probe to telemetry will be a good starting point.
There was a problem hiding this comment.
added real detection code for gnmi service listen 50051
| kill_container() { | ||
| POD_NAME=$(get_pod_name) | ||
| if [[ -n "$POD_NAME" ]]; then | ||
| kubectl exec "$POD_NAME" -n "$NAMESPACE" -- pkill -f telemetry |
There was a problem hiding this comment.
have you tried kubectl exec? this may not work in our setup - we don't have a kubemaster to device channel today. so maybe you should use docker command to find the container with the pod name in it and restart it through docker instead
this is again some complexity because we do not put telemetry into its own daemonset/pod.
There was a problem hiding this comment.
Can we add a label like "feature: telemetry" which has the same name as FEATURE table?
| - Patch systemd service files. | ||
|
|
||
|
|
||
| patch_systemd.sh |
There was a problem hiding this comment.
i'd prefer to make the sidecar container written in golang so that we don't need a base image for it.
| The desired state script will based on top of tree from specific code branch. | ||
|
|
||
| ```mermaid | ||
| sequenceDiagram |
There was a problem hiding this comment.
one thing we should probably clarify in this diagram is that we don't intend to change postupgrade script to use a desired state file(or the same desired state file that we use in the sidecar). if any postupgrade script is doing a diff based update it will still be that way but sidecar will overwrite it.
| telemetry-watchdog->>k8s: skip detect and return OK | ||
| ``` | ||
|
|
||
| ### 5.4 After KubeSonic rollout, rollback the container. (v2+ -> v2) |
There was a problem hiding this comment.
featured seems to use "systemctl show telemetry --property UnitFileState" to get the status of a service - in our current design how do we make sure it will reflect whether telemetry is running or not?
There was a problem hiding this comment.
this doesn't seem to work as expected today so we might be fine. the logic is: 1. check show --property result, if it's enabled don't do anything and set state table as enabled. 2. try enable and start if no error, set state table as enabled.
There was a problem hiding this comment.
we'll have no change in current featured monitoring workflow.
|
|
||
| syslog.syslog(syslog.LOG_ERR, ERROR_CGROUP_MEMORY_USAGE_NOT_FOUND.format(docker_memory_usage_file_path, container_id)) | ||
| sys.exit(INTERNAL_ERROR) | ||
| ``` |
There was a problem hiding this comment.
we should probably just say after k8s we simplify the restart logic to be OOM based only - I'm not sure why we can tolerate memory usage exceeding the limit for several times as what's implemented right now.
There was a problem hiding this comment.
I'm not sure previous intention for retry, updated this part as well.
| #### FEATURE Table Snapshot | ||
| ```json | ||
| "telemetry": { | ||
| "auto_restart": "enabled", |
There was a problem hiding this comment.
do we still support auto_restart with systemd stub? maybe delayed is still supported through the stub? not sure about that part...
same for check_up_status. we don't use set_owner anymore with this design, right?
| sleep 1000 | ||
| fi | ||
| done | ||
| exec /usr/local/bin/supervisord |
There was a problem hiding this comment.
what if the feature is enabled and is running by k8s pod, then someone disables the feature, I suppose the systemd service may be stopped, but what's the behavior of the container ? will it exit ? We should clear the behavior
There was a problem hiding this comment.
systemd will kill and restart container, after restart container will read from FEATURE table and know it's disabled then enters idle status.
| #### v1+ Behavior with Kubernetes DaemonSet | ||
| - The container is deployed via Kubernetes DaemonSet. | ||
| - The systemd service is retained as a **stub**, to avoid breaking automation or tools that query it. | ||
| - For start/stop/restart, it will just kill container simply so that kubernetes will relaunch it automatically. Here is some limitation: since kubernetes takes over the container thus systemd stop will not STOP container really, unless we tint the node but that requires each container should use dedicated daemon set which is not preferrable as well. |
There was a problem hiding this comment.
for start action, we may return immediately, because kubelet will always start it if it's not running.
There was a problem hiding this comment.
For guarantee the correctness we may simply kill/restart container, so that latest FEATURE table could be read.
| - The container is deployed via Kubernetes DaemonSet. | ||
| - The systemd service is retained as a **stub**, to avoid breaking automation or tools that query it. | ||
| - For start/stop/restart, it will just kill container simply so that kubernetes will relaunch it automatically. Here is some limitation: since kubernetes takes over the container thus systemd stop will not STOP container really, unless we tint the node but that requires each container should use dedicated daemon set which is not preferrable as well. | ||
| - For status, it will return runtime state via kubectl. |
There was a problem hiding this comment.
how do you get the status for the container by kubectl ? If you want to do it, you may need to get the corresponding pod and extract the container status from the pod, there is not a clear mapping between pod and the feature container, so we may return the status based on feature status.
There was a problem hiding this comment.
Can we add a label like "feature: telemetry" which has the same name as FEATURE table now?
| exit 1 | ||
| } | ||
|
|
||
| get_pod_name() { |
There was a problem hiding this comment.
it assumes there is only one pod on the node, but it may not be true.
There was a problem hiding this comment.
we can update during implementation.
| kill_container() { | ||
| POD_NAME=$(get_pod_name) | ||
| if [[ -n "$POD_NAME" ]]; then | ||
| kubectl exec "$POD_NAME" -n "$NAMESPACE" -- pkill -f telemetry |
There was a problem hiding this comment.
in the pod, the service is still managed by systemd inside the container, so I think you may call supervisorctl to restart it
There was a problem hiding this comment.
please note, we can deploy multiple containers in a pod, we usually use the feature name as the container name, you will need to consider to use '-c ' in kubectl exec
| end | ||
| ``` | ||
|
|
||
| ### 5.4 After KubeSonic rollout, rollback the container to imaged based version. (v2 -> v0) |
There was a problem hiding this comment.
what's the scenario for v2 -> v0 ? In k8s rollout, if v2 is failed, it will rollout to v1 instead of v0, right ?
There was a problem hiding this comment.
v1 only has systemd stub files, here I mean rollback to original SONiC image based version.
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
| #### v0 Behavior | ||
| - Container is fully managed by `systemd` (e.g., `telemetry.service`). | ||
| - FEATURE table controls startup (`state: Enabled/Disabled`). | ||
| - Actual container starts or stops via `systemctl` by featured.service |
There was a problem hiding this comment.
featured has used other systemctl commands like mask and unmask, how to support them?
There was a problem hiding this comment.
@FengPan-Frank - let's take a look at the featured implementation. It seems that telemetry systemd service doesn't even support mask/unmask/enable/disable stuff but for the containers that do support we should probably just treat them all as restart.
| As section "Enforce CPU and memory resource constraints natively" mentioned, this monit functionality could be covered by Kubernetes, thus for monit we need to get rid of if from the KubeSonic rollouted container. However, during transition period if there's any case decalres that `monit` must be retained temporarily, we can also rewrite /usr/bin/memory_check to use Kubernetes data (e.g., via `kubectl top`) instead of reading Docker or CGroup files like above. | ||
|
|
||
| --- | ||
|
|
There was a problem hiding this comment.
How to support container_checker?
There was a problem hiding this comment.
updated container_check into doc as well.
| #### v1+ Behavior with Kubernetes DaemonSet | ||
| - The container is deployed via Kubernetes DaemonSet. | ||
| - The systemd service is retained as a **stub**, to avoid breaking automation or tools that query it. | ||
| - For start/stop/restart, it will just kill container simply so that kubernetes will relaunch it automatically. Here is some limitation: since kubernetes takes over the container thus systemd stop will not STOP container really, unless we taint the node but that requires each container should use dedicated daemon set which is not preferrable as well. |
There was a problem hiding this comment.
We don't really support the container, if the container has memory issue, this might not be enough.
There was a problem hiding this comment.
do you mean OOM issue? OOM should be handled via below section 6
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |
|
/azp run |
|
No pipelines are associated with this pull request. |


Add design for migrating systemd-managed docker containers to Kubernetes with resource control