Skip to content

Add telemetry-watchdog container implementation for KubeSonic rollout#23724

Merged
qiluo-msft merged 9 commits intosonic-net:masterfrom
FengPan-Frank:telemetry_watchdog
Sep 4, 2025
Merged

Add telemetry-watchdog container implementation for KubeSonic rollout#23724
qiluo-msft merged 9 commits intosonic-net:masterfrom
FengPan-Frank:telemetry_watchdog

Conversation

@FengPan-Frank
Copy link
Copy Markdown
Contributor

@FengPan-Frank FengPan-Frank commented Aug 15, 2025

Why I did it

To implement KubeSonic Design https://github.com/sonic-net/SONiC/blob/ce3ffda18399add1435cb18299c267733dcc2b38/doc/kubernetes/k8s_migration_design.md

Work item tracking
  • Microsoft ADO (number only):34506286

How I did it

Implement watchdog in rust

How to verify it

image

Which release branch to backport (provide reason below if selected)

  • 202205
  • 202211
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@FengPan-Frank FengPan-Frank changed the title Add telemetry-watchdog implementation for KubeSonic rollout Add telemetry-watchdog container implementation for KubeSonic rollout Aug 20, 2025
Copy link
Copy Markdown
Contributor

@lixiaoyuner lixiaoyuner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,50 @@
{% from "dockers/dockerfile-macros.j2" import install_debian_packages, install_python_wheels, copy_files %}
ARG BASE=docker-config-engine-bookworm-{{DOCKER_USERNAME}}:{{DOCKER_USERTAG}}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this still uses sonic as the base image so the image size won't be small - can we avoid it?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not let's make sure we use the same base for watchdog as telemetry.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

the docker image is not large for telemetry-watchdog.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't show the shared image size - 270M is huge...

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we expect all the containers using sonic as base to share the base, which is roughly 200M to 300M, and each container itself uses less than 100M

why don't use the same base as telemetry?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the screenshot, seems all current watchdog has the same size, like auditd-watchdog, gnmi-watchdog..

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's have a chat on this today when you're available.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

I double checked that actually the unique size is only 1MB for telemetry-watchdog, most are shared storage.

Ok(c) => c,
Err(e) => {
eprintln!("Redis client error (port): {e}");
return 50051;
Copy link
Copy Markdown

@make1980 make1980 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

50051

please define this as a constant #Closed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

Copy link
Copy Markdown

@make1980 make1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

}

fn get_gnmi_port() -> u16 {
let client = match redis::Client::open("redis://127.0.0.1:6379/4") {
Copy link
Copy Markdown
Collaborator

@qiluo-msft qiluo-msft Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redis

Could you use Rust swss-common crate? DBConnector or ConfigDBConnector are available there. #Pending

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any project is using that now, could you pls give a pointer?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#23821, I created an issue for tracking this change later, could we merge this PR first for internal image generation?

@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to msft-202412: Azure/sonic-buildimage-msft#1687

@r12f
Copy link
Copy Markdown
Contributor

r12f commented Oct 7, 2025

Manual 202412 PR here: Azure/sonic-buildimage-msft#1697

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants