Improve service checker for gnmi/telemetry container#19153
Improve service checker for gnmi/telemetry container#19153StormLiangMS merged 3 commits intosonic-net:masterfrom
Conversation
|
/azpw ms_conflict |
|
@qiluo-msft , can you please review this PR ? an if it is approved, can you please merge ? |
|
@qiluo-msft kindly reminder to review. it should go to 202311 and 202405 |
| """ | ||
| try: | ||
| DOCKER_CLIENT = docker.DockerClient(base_url='unix://var/run/docker.sock') | ||
| DOCKER_CLIENT.images.get(image_name) |
There was a problem hiding this comment.
Is there an assumption that images.get function will drop an exception when image is not there? Should we also check what it returns other than depend the logic of images.get function? For example, if the images.get refined to return a null for a search miss, it would still return True by check_docker_image. @ganglyu
There was a problem hiding this comment.
This function will raise exception if the image does not exist:
https://docker-py.readthedocs.io/en/stable/images.html#docker.models.images.ImageCollection.get
Raises:
docker.errors.ImageNotFound – If the image does not exist.
docker.errors.APIError – If the server returns an error.
StormLiangMS
left a comment
There was a problem hiding this comment.
this change looks like more a work around, but I'm ok with it to deal with the difficulty of warm reboot case.
@ganglyu could you help to update the description about the details of difficulty of warm reboot case?
|
@qiluo-msft to take a look before merge. |
|
@qiluo-msft , Can you please approve and merge ? |
Why I did it Fix sonic-net#19081 We have used gnmi container to replace telemetry container, and telemetry is still enabled after upgrade. service_checker script reads from features table and check if the container is running, telemetry is enabled but there's no telemetry container. It's difficult to disable telemetry in feature table for warm reboot and cold reboot, we need to check docker image in db migrator and minigraph.py. When we use warm reboot to upgrade from 202305 to 202311, config_db still has telemetry configuration, and we can't simply remove related configuration. Work item tracking Microsoft ADO (number only): How I did it I modify service_checker script: If there's docker-sonic-telemetry image, check telemetry container. If there's no docker-sonic-telemetry image, check gnmi container instead. If there's no docker-sonic-telemetry image and docker-sonic-gnmi image, do not check telemetry. How to verify it Run unit test and end to end test.
|
Cherry-pick PR to 202405: #19332 |
Why I did it Fix #19081 We have used gnmi container to replace telemetry container, and telemetry is still enabled after upgrade. service_checker script reads from features table and check if the container is running, telemetry is enabled but there's no telemetry container. It's difficult to disable telemetry in feature table for warm reboot and cold reboot, we need to check docker image in db migrator and minigraph.py. When we use warm reboot to upgrade from 202305 to 202311, config_db still has telemetry configuration, and we can't simply remove related configuration. Work item tracking Microsoft ADO (number only): How I did it I modify service_checker script: If there's docker-sonic-telemetry image, check telemetry container. If there's no docker-sonic-telemetry image, check gnmi container instead. If there's no docker-sonic-telemetry image and docker-sonic-gnmi image, do not check telemetry. How to verify it Run unit test and end to end test.
|
@yxieca , Can you please cherry pick to 202311 ? |
|
@ganglyu can you provide the ADO number? |
Why I did it Fix sonic-net#19081 We have used gnmi container to replace telemetry container, and telemetry is still enabled after upgrade. service_checker script reads from features table and check if the container is running, telemetry is enabled but there's no telemetry container. It's difficult to disable telemetry in feature table for warm reboot and cold reboot, we need to check docker image in db migrator and minigraph.py. When we use warm reboot to upgrade from 202305 to 202311, config_db still has telemetry configuration, and we can't simply remove related configuration. Work item tracking Microsoft ADO (number only): How I did it I modify service_checker script: If there's docker-sonic-telemetry image, check telemetry container. If there's no docker-sonic-telemetry image, check gnmi container instead. If there's no docker-sonic-telemetry image and docker-sonic-gnmi image, do not check telemetry. How to verify it Run unit test and end to end test.
|
Cherry-pick PR to 202311: #19360 |
Why I did it Fix #19081 We have used gnmi container to replace telemetry container, and telemetry is still enabled after upgrade. service_checker script reads from features table and check if the container is running, telemetry is enabled but there's no telemetry container. It's difficult to disable telemetry in feature table for warm reboot and cold reboot, we need to check docker image in db migrator and minigraph.py. When we use warm reboot to upgrade from 202305 to 202311, config_db still has telemetry configuration, and we can't simply remove related configuration. Work item tracking Microsoft ADO (number only): How I did it I modify service_checker script: If there's docker-sonic-telemetry image, check telemetry container. If there's no docker-sonic-telemetry image, check gnmi container instead. If there's no docker-sonic-telemetry image and docker-sonic-gnmi image, do not check telemetry. How to verify it Run unit test and end to end test. Co-authored-by: ganglv <[email protected]>
Why I did it
Fix #19081
We have used gnmi container to replace telemetry container, and telemetry is still enabled after upgrade.
service_checker script reads from features table and check if the container is running, telemetry is enabled but there's no telemetry container.
It's difficult to disable telemetry in feature table for warm reboot and cold reboot, we need to check docker image in db migrator and minigraph.py.
When we use warm reboot to upgrade from 202305 to 202311, config_db still has telemetry configuration, and we can't simply remove related configuration.
Work item tracking
How I did it
I modify service_checker script:
If there's docker-sonic-telemetry image, check telemetry container.
If there's no docker-sonic-telemetry image, check gnmi container instead.
If there's no docker-sonic-telemetry image and docker-sonic-gnmi image, do not check telemetry.
How to verify it
Run unit test and end to end test.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)