Skip to content

ansible: support pulling ceos_image from docker registry#22794

Merged
wangxinbot merged 2 commits intosonic-net:masterfrom
wangxin:ceos-image-registry
Mar 13, 2026
Merged

ansible: support pulling ceos_image from docker registry#22794
wangxinbot merged 2 commits intosonic-net:masterfrom
wangxin:ceos-image-registry

Conversation

@wangxin
Copy link
Collaborator

@wangxin wangxin commented Mar 7, 2026

Description of PR

Summary:
Add support for an optional docker registry (e.g. ACR, MCR) as an image source for ceos_image, so that deployments with strict registry-only policies (e.g. Microsoft internal) can pull the image directly from a registry without ever creating a locally-built image.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

In environments that enforce registry-only container image policies (e.g. Microsoft requiring all images to originate from ACR or MCR), the previous approach of building ceos_image locally or re-tagging a pulled image with a local name would trigger security alerts. This PR allows ceos_image to be sourced directly from a user-configured registry, so containers always reference the registry-prefixed image name and no locally-built alias is created.

How did you do it?

Three files are changed:

ansible/group_vars/vm_host/ceos.yml

  • Added three commented-out placeholder variables: ceos_registry, ceos_registry_username, ceos_registry_password.
  • When ceos_registry is uncommented and set, the new registry logic is activated. Credentials are optional and only needed for authenticated registries.

ansible/roles/vm_set/tasks/add_ceos_list.yml

  • When ceos_registry is defined, check whether <registry>/<ceos_image> is already cached locally (a prior pull). If found, it is used directly — no docker tag is performed, so no local alias is created.
  • If not cached and ceos_registry is defined, optionally log in (only when credentials are provided) and docker pull the image.
  • Only falls back to the existing download-ceos_image_orig-and-build path when registry is not configured or the pull fails.
  • ceos_image_found logic is registry-aware: when ceos_registry is defined, a plain local ceos_image does not count as "found", ensuring the registry path is always taken.

ansible/roles/eos/tasks/ceos.yml and ceos_ensure_reachable.yml

  • After loading group_vars/vm_host/ceos.yml, a docker_image_info check determines whether the registry image is available locally.
  • ceos_effective_image is set to <registry>/<ceos_image> when the registry image is present, or plain ceos_image otherwise.
  • All docker_container tasks use ceos_effective_image instead of ceos_image, so containers always reference the registry image in registry-enabled environments.

How did you verify/test it?

  • Tested "add-topo" by specifying ceos_registry. The ceos image from the registry is used successfully.
  • Tested "add-topo" without specifying ceos_registry. The existing logic of using a local ceos image or preparing ceos image from ceos_image_orig works as expected.

Any platform specific information?

N/A

Supported testbed topology if it's a new test case?

N/A

Documentation

No documentation update needed. The new variables are self-documented with inline comments in ansible/group_vars/vm_host/ceos.yml.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Add support for an optional docker registry as an additional source for
ceos_image, before falling back to the existing download-and-build path.
When a registry is configured, containers always reference the registry-
prefixed image name so no locally-built alias is created.

Changes in ansible/group_vars/vm_host/ceos.yml:
- Add commented-out placeholder variables ceos_registry,
  ceos_registry_username, and ceos_registry_password with instructions.
  Username/password are optional and only needed when the registry
  requires authentication.

Changes in ansible/roles/vm_set/tasks/add_ceos_list.yml:
- When ceos_registry is defined, check if the registry-prefixed image is
  already cached locally. If found, use it directly without re-tagging.
- If not cached, login to the registry (only when credentials are
  provided) and attempt to pull the image.
- ceos_image_found is registry-aware: when ceos_registry is defined, a
  plain local ceos_image is not counted as found, ensuring the registry
  path is always taken.
- Only fall back to the download-ceos_image_orig / build-ceos_image
  path when no registry is configured or the registry pull failed.

Changes in ansible/roles/eos/tasks/ceos.yml and ceos_ensure_reachable.yml:
- After loading ceos vars, check whether the registry image is available
  locally and set ceos_effective_image accordingly.
- All docker_container tasks use ceos_effective_image instead of
  ceos_image, so containers always reference the registry image in
  registry-enabled environments without creating a local alias.

Signed-off-by: Xin Wang <[email protected]>
Co-authored-by: Copilot <[email protected]>
@wangxin wangxin force-pushed the ceos-image-registry branch from abdb30a to 8f859a6 Compare March 7, 2026 14:17
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Collaborator

@StormLiangMS StormLiangMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Overall clean PR — the approach of optional registry support with download-and-build fallback is sound. A few concerns:

Issues

1. Credentials in plain text: ceos_registry_password is stored as a plain-text variable in group_vars. This could get committed to the repo or inventory. Consider using Ansible Vault, environment variables, or at minimum adding a comment warning users not to commit credentials in plain text.

2. ceos_effective_image undefined in ceos_ensure_reachable.yml: This file references {{ ceos_effective_image }} but only ceos.yml sets it via set_fact. If ceos_ensure_reachable.yml is invoked independently (not through the main ceos.yml flow), ceos_effective_image will be undefined. It needs the same include_vars + docker_image_info + set_fact logic, or the variable needs to be set at a higher scope.

3. ceos_image_found is a string, not a boolean: The set_fact with Jinja2 {{ ... }} produces a string "True"/"False", not a Python boolean. The when: not ceos_image_found check at the bottom might not behave as expected — in Ansible, the string "False" is truthy. Should use:

ceos_image_found: "{{ ... | bool }}"

4. No docker logout: After docker_login, there's no corresponding cleanup to remove credentials from ~/.docker/config.json on the host.

5. Registry pull failure is silently ignored: ignore_errors: yes on the pull task means a registry misconfiguration (wrong URL, bad credentials) silently falls through to the download-and-build path. Consider adding a debug message when the pull fails, so users know why the fallback was triggered.

Minor

  • The when condition at the bottom could be simplified — ceos_registry_pull_result is defined and ceos_registry_pull_result is failed could use | default(false) pattern.
  • Good that pull: no is kept on the container create tasks.

Fix three issues raised in code review:

Issue sonic-net#2 - ceos_effective_image undefined in ceos_ensure_reachable.yml:
Add a defensive set_fact at the top of ceos_ensure_reachable.yml that
sets ceos_effective_image to ceos_image as a fallback, guarded by
'when: ceos_effective_image is not defined'. In the normal flow the
variable is already set by the caller (ceos.yml in the eos role) so
this task is a no-op. It prevents an undefined variable error if the
file is ever invoked independently.

Issue sonic-net#3 - ceos_image_found stored as string instead of boolean:
The set_fact used a >- block scalar which causes Jinja2 to produce the
string 'True'/'False'. In Ansible the string 'False' is truthy, so
'when: not ceos_image_found' would never enter the prepare block when
the image is absent. Fixed by collapsing to a single quoted expression
and appending '| bool' to ensure Python boolean storage.

Issue sonic-net#5 - registry pull failure silently ignored:
Added a debug task immediately after the pull task that emits a WARNING
message when ceos_registry_pull_result is failed. Users can now see in
the Ansible output why the fallback to download-and-build was triggered.

Signed-off-by: Xin Wang <[email protected]>
Co-authored-by: Copilot <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@wangxin
Copy link
Collaborator Author

wangxin commented Mar 9, 2026

Thanks @StormLiangMS for the thorough review! Here is a response to each issue:

Issue #1 - Credentials in plain text: Acknowledged. In our automation environment, secrets are downloaded at runtime from a vault as a JSON file, so ceos_registry_password can reference that file rather than a hardcoded value. The placeholder in ceos.yml is commented out, so no credentials are committed to the repo. The comment in the file already advises users not to commit real values. No code change needed.

Issue #2 - ceos_effective_image undefined in ceos_ensure_reachable.yml: Fixed. Added a defensive set_fact at the top of ceos_ensure_reachable.yml guarded by when: ceos_effective_image is not defined. In the normal flow (always called via include_tasks from ceos.yml) the variable is already set and this task is a no-op. If the file is ever invoked independently it falls back safely to ceos_image.

Issue #3 - ceos_image_found is a string, not a boolean: Fixed. The >- block scalar was causing the fact to be stored as the string "False", which is truthy in Python/Jinja2, meaning when: not ceos_image_found would never have entered the prepare block. Replaced with a single-line quoted expression and appended | bool to ensure a proper Python boolean is stored.

Issue #4 - No docker logout: Intentionally left as-is. The registry credentials are scoped to the testbed VM host and managed by our automation, so leaving the session open is acceptable in this environment.

Issue #5 - Registry pull failure silently ignored: Fixed. Added a debug task immediately after the pull that emits a WARNING message including the registry and image name, and a hint to check ceos_registry and credentials. Users will now see clearly in the Ansible output why the fallback to download-and-build was triggered.

Issues #2, #3, and #5 are addressed in the follow-up commit 4a2f6e0.

Copy link
Collaborator

@StormLiangMS StormLiangMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the review feedback.

Addressed ✅:

  • #2: ceos_effective_image fallback in ceos_ensure_reachable.yml — good defensive approach
  • #3: | bool filter on ceos_image_found — correct fix
  • #5: Debug warning on registry pull failure — helpful for troubleshooting

Remaining (minor, non-blocking):

  • #1 (credentials): Acceptable since they're commented-out placeholders. Maybe add a note like # WARNING: Do not commit real credentials here. Use Ansible Vault or environment variables.
  • #4 (docker logout): Low risk since the credentials persist only on the VM host, but worth a follow-up if security-sensitive registries are used.

LGTM overall. The code is clean, well-commented, and the fallback logic is solid.

@wangxin
Copy link
Collaborator Author

wangxin commented Mar 12, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@wangxinbot wangxinbot merged commit 69e034d into sonic-net:master Mar 13, 2026
18 of 22 checks passed
selldinesh pushed a commit to selldinesh/sonic-mgmt that referenced this pull request Mar 16, 2026
…2794)

What is the motivation for this PR?
In environments that enforce registry-only container image policies (e.g. Microsoft requiring all images to originate from ACR or MCR), the previous approach of building ceos_image locally or re-tagging a pulled image with a local name would trigger security alerts. This PR allows ceos_image to be sourced directly from a user-configured registry, so containers always reference the registry-prefixed image name and no locally-built alias is created.

How did you do it?
Three files are changed:

ansible/group_vars/vm_host/ceos.yml

Added three commented-out placeholder variables: ceos_registry, ceos_registry_username, ceos_registry_password.
When ceos_registry is uncommented and set, the new registry logic is activated. Credentials are optional and only needed for authenticated registries.
ansible/roles/vm_set/tasks/add_ceos_list.yml

When ceos_registry is defined, check whether <registry>/<ceos_image> is already cached locally (a prior pull). If found, it is used directly — no docker tag is performed, so no local alias is created.
If not cached and ceos_registry is defined, optionally log in (only when credentials are provided) and docker pull the image.
Only falls back to the existing download-ceos_image_orig-and-build path when registry is not configured or the pull fails.
ceos_image_found logic is registry-aware: when ceos_registry is defined, a plain local ceos_image does not count as "found", ensuring the registry path is always taken.
ansible/roles/eos/tasks/ceos.yml and ceos_ensure_reachable.yml

After loading group_vars/vm_host/ceos.yml, a docker_image_info check determines whether the registry image is available locally.
ceos_effective_image is set to <registry>/<ceos_image> when the registry image is present, or plain ceos_image otherwise.
All docker_container tasks use ceos_effective_image instead of ceos_image, so containers always reference the registry image in registry-enabled environments.
How did you verify/test it?
Tested "add-topo" by specifying ceos_registry. The ceos image from the registry is used successfully.
Tested "add-topo" without specifying ceos_registry. The existing logic of using a local ceos image or preparing ceos image from ceos_image_orig works as expected.

Signed-off-by: Xin Wang <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: selldinesh <[email protected]>
abhishek-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Mar 17, 2026
…2794)

What is the motivation for this PR?
In environments that enforce registry-only container image policies (e.g. Microsoft requiring all images to originate from ACR or MCR), the previous approach of building ceos_image locally or re-tagging a pulled image with a local name would trigger security alerts. This PR allows ceos_image to be sourced directly from a user-configured registry, so containers always reference the registry-prefixed image name and no locally-built alias is created.

How did you do it?
Three files are changed:

ansible/group_vars/vm_host/ceos.yml

Added three commented-out placeholder variables: ceos_registry, ceos_registry_username, ceos_registry_password.
When ceos_registry is uncommented and set, the new registry logic is activated. Credentials are optional and only needed for authenticated registries.
ansible/roles/vm_set/tasks/add_ceos_list.yml

When ceos_registry is defined, check whether <registry>/<ceos_image> is already cached locally (a prior pull). If found, it is used directly — no docker tag is performed, so no local alias is created.
If not cached and ceos_registry is defined, optionally log in (only when credentials are provided) and docker pull the image.
Only falls back to the existing download-ceos_image_orig-and-build path when registry is not configured or the pull fails.
ceos_image_found logic is registry-aware: when ceos_registry is defined, a plain local ceos_image does not count as "found", ensuring the registry path is always taken.
ansible/roles/eos/tasks/ceos.yml and ceos_ensure_reachable.yml

After loading group_vars/vm_host/ceos.yml, a docker_image_info check determines whether the registry image is available locally.
ceos_effective_image is set to <registry>/<ceos_image> when the registry image is present, or plain ceos_image otherwise.
All docker_container tasks use ceos_effective_image instead of ceos_image, so containers always reference the registry image in registry-enabled environments.
How did you verify/test it?
Tested "add-topo" by specifying ceos_registry. The ceos image from the registry is used successfully.
Tested "add-topo" without specifying ceos_registry. The existing logic of using a local ceos image or preparing ceos image from ceos_image_orig works as expected.

Signed-off-by: Xin Wang <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Abhishek <[email protected]>
vrajeshe pushed a commit to vrajeshe/sonic-mgmt that referenced this pull request Mar 23, 2026
…2794)

What is the motivation for this PR?
In environments that enforce registry-only container image policies (e.g. Microsoft requiring all images to originate from ACR or MCR), the previous approach of building ceos_image locally or re-tagging a pulled image with a local name would trigger security alerts. This PR allows ceos_image to be sourced directly from a user-configured registry, so containers always reference the registry-prefixed image name and no locally-built alias is created.

How did you do it?
Three files are changed:

ansible/group_vars/vm_host/ceos.yml

Added three commented-out placeholder variables: ceos_registry, ceos_registry_username, ceos_registry_password.
When ceos_registry is uncommented and set, the new registry logic is activated. Credentials are optional and only needed for authenticated registries.
ansible/roles/vm_set/tasks/add_ceos_list.yml

When ceos_registry is defined, check whether <registry>/<ceos_image> is already cached locally (a prior pull). If found, it is used directly — no docker tag is performed, so no local alias is created.
If not cached and ceos_registry is defined, optionally log in (only when credentials are provided) and docker pull the image.
Only falls back to the existing download-ceos_image_orig-and-build path when registry is not configured or the pull fails.
ceos_image_found logic is registry-aware: when ceos_registry is defined, a plain local ceos_image does not count as "found", ensuring the registry path is always taken.
ansible/roles/eos/tasks/ceos.yml and ceos_ensure_reachable.yml

After loading group_vars/vm_host/ceos.yml, a docker_image_info check determines whether the registry image is available locally.
ceos_effective_image is set to <registry>/<ceos_image> when the registry image is present, or plain ceos_image otherwise.
All docker_container tasks use ceos_effective_image instead of ceos_image, so containers always reference the registry image in registry-enabled environments.
How did you verify/test it?
Tested "add-topo" by specifying ceos_registry. The ceos image from the registry is used successfully.
Tested "add-topo" without specifying ceos_registry. The existing logic of using a local ceos image or preparing ceos image from ceos_image_orig works as expected.

Signed-off-by: Xin Wang <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Venkata Gouri Rajesh Etla <[email protected]>
@mssonicbld
Copy link
Collaborator

@wangxin PR conflicts with 202505 branch

@mssonicbld
Copy link
Collaborator

@wangxin PR conflicts with 202411 branch

mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Mar 24, 2026
…2794)

What is the motivation for this PR?
In environments that enforce registry-only container image policies (e.g. Microsoft requiring all images to originate from ACR or MCR), the previous approach of building ceos_image locally or re-tagging a pulled image with a local name would trigger security alerts. This PR allows ceos_image to be sourced directly from a user-configured registry, so containers always reference the registry-prefixed image name and no locally-built alias is created.

How did you do it?
Three files are changed:

ansible/group_vars/vm_host/ceos.yml

Added three commented-out placeholder variables: ceos_registry, ceos_registry_username, ceos_registry_password.
When ceos_registry is uncommented and set, the new registry logic is activated. Credentials are optional and only needed for authenticated registries.
ansible/roles/vm_set/tasks/add_ceos_list.yml

When ceos_registry is defined, check whether <registry>/<ceos_image> is already cached locally (a prior pull). If found, it is used directly — no docker tag is performed, so no local alias is created.
If not cached and ceos_registry is defined, optionally log in (only when credentials are provided) and docker pull the image.
Only falls back to the existing download-ceos_image_orig-and-build path when registry is not configured or the pull fails.
ceos_image_found logic is registry-aware: when ceos_registry is defined, a plain local ceos_image does not count as "found", ensuring the registry path is always taken.
ansible/roles/eos/tasks/ceos.yml and ceos_ensure_reachable.yml

After loading group_vars/vm_host/ceos.yml, a docker_image_info check determines whether the registry image is available locally.
ceos_effective_image is set to <registry>/<ceos_image> when the registry image is present, or plain ceos_image otherwise.
All docker_container tasks use ceos_effective_image instead of ceos_image, so containers always reference the registry image in registry-enabled environments.
How did you verify/test it?
Tested "add-topo" by specifying ceos_registry. The ceos image from the registry is used successfully.
Tested "add-topo" without specifying ceos_registry. The existing logic of using a local ceos image or preparing ceos image from ceos_image_orig works as expected.

Signed-off-by: Xin Wang <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: mssonicbld <[email protected]>
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202511: #23252

wangxinbot pushed a commit to wangxinbot/sonic-mgmt that referenced this pull request Mar 24, 2026
…2794)

What is the motivation for this PR?
In environments that enforce registry-only container image policies (e.g. Microsoft requiring all images to originate from ACR or MCR), the previous approach of building ceos_image locally or re-tagging a pulled image with a local name would trigger security alerts. This PR allows ceos_image to be sourced directly from a user-configured registry, so containers always reference the registry-prefixed image name and no locally-built alias is created.

How did you do it?
Three files are changed:

ansible/group_vars/vm_host/ceos.yml

Added three commented-out placeholder variables: ceos_registry, ceos_registry_username, ceos_registry_password.
When ceos_registry is uncommented and set, the new registry logic is activated. Credentials are optional and only needed for authenticated registries.
ansible/roles/vm_set/tasks/add_ceos_list.yml

When ceos_registry is defined, check whether <registry>/<ceos_image> is already cached locally (a prior pull). If found, it is used directly — no docker tag is performed, so no local alias is created.
If not cached and ceos_registry is defined, optionally log in (only when credentials are provided) and docker pull the image.
Only falls back to the existing download-ceos_image_orig-and-build path when registry is not configured or the pull fails.
ceos_image_found logic is registry-aware: when ceos_registry is defined, a plain local ceos_image does not count as "found", ensuring the registry path is always taken.
ansible/roles/eos/tasks/ceos.yml and ceos_ensure_reachable.yml

After loading group_vars/vm_host/ceos.yml, a docker_image_info check determines whether the registry image is available locally.
ceos_effective_image is set to <registry>/<ceos_image> when the registry image is present, or plain ceos_image otherwise.
All docker_container tasks use ceos_effective_image instead of ceos_image, so containers always reference the registry image in registry-enabled environments.
How did you verify/test it?
Tested "add-topo" by specifying ceos_registry. The ceos image from the registry is used successfully.
Tested "add-topo" without specifying ceos_registry. The existing logic of using a local ceos image or preparing ceos image from ceos_image_orig works as expected.

Signed-off-by: Xin Wang <[email protected]>
Co-authored-by: Copilot <[email protected]>
wangxinbot pushed a commit to wangxinbot/sonic-mgmt that referenced this pull request Mar 24, 2026
…2794)

What is the motivation for this PR?
In environments that enforce registry-only container image policies (e.g. Microsoft requiring all images to originate from ACR or MCR), the previous approach of building ceos_image locally or re-tagging a pulled image with a local name would trigger security alerts. This PR allows ceos_image to be sourced directly from a user-configured registry, so containers always reference the registry-prefixed image name and no locally-built alias is created.

How did you do it?
Three files are changed:

ansible/group_vars/vm_host/ceos.yml

Added three commented-out placeholder variables: ceos_registry, ceos_registry_username, ceos_registry_password.
When ceos_registry is uncommented and set, the new registry logic is activated. Credentials are optional and only needed for authenticated registries.
ansible/roles/vm_set/tasks/add_ceos_list.yml

When ceos_registry is defined, check whether <registry>/<ceos_image> is already cached locally (a prior pull). If found, it is used directly — no docker tag is performed, so no local alias is created.
If not cached and ceos_registry is defined, optionally log in (only when credentials are provided) and docker pull the image.
Only falls back to the existing download-ceos_image_orig-and-build path when registry is not configured or the pull fails.
ceos_image_found logic is registry-aware: when ceos_registry is defined, a plain local ceos_image does not count as "found", ensuring the registry path is always taken.
ansible/roles/eos/tasks/ceos.yml and ceos_ensure_reachable.yml

After loading group_vars/vm_host/ceos.yml, a docker_image_info check determines whether the registry image is available locally.
ceos_effective_image is set to <registry>/<ceos_image> when the registry image is present, or plain ceos_image otherwise.
All docker_container tasks use ceos_effective_image instead of ceos_image, so containers always reference the registry image in registry-enabled environments.
How did you verify/test it?
Tested "add-topo" by specifying ceos_registry. The ceos image from the registry is used successfully.
Tested "add-topo" without specifying ceos_registry. The existing logic of using a local ceos image or preparing ceos image from ceos_image_orig works as expected.

Signed-off-by: Xin Wang <[email protected]>
Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment