Skip to content

[upgrade_image] Fix python interpreter mismatch after upgrade#21967

Merged
StormLiangMS merged 1 commit intosonic-net:masterfrom
lolyu:fix_upgrade_image_202511
Jan 19, 2026
Merged

[upgrade_image] Fix python interpreter mismatch after upgrade#21967
StormLiangMS merged 1 commit intosonic-net:masterfrom
lolyu:fix_upgrade_image_202511

Conversation

@lolyu
Copy link
Copy Markdown
Collaborator

@lolyu lolyu commented Jan 19, 2026

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Approach

What is the motivation for this PR?

202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')

The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5

Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?

Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?

2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

Signed-off-by: Longxiang <lolv@microsoft.com>
@lolyu lolyu requested review from wangxin and yxieca as code owners January 19, 2026 06:41
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@lolyu lolyu added the Request for 202511 branch Request to backport a change to 202511 branch label Jan 19, 2026
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@StormLiangMS StormLiangMS merged commit 27a47c8 into sonic-net:master Jan 19, 2026
22 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Jan 19, 2026
…net#21967)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====
@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to 202511: #21969

PriyanshTratiya pushed a commit to PriyanshTratiya/sonic-mgmt that referenced this pull request Jan 21, 2026
…net#21967)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====

Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
vmittal-msft pushed a commit that referenced this pull request Jan 22, 2026
#21969)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====

Co-authored-by: Longxiang Lyu <35479537+lolyu@users.noreply.github.com>
saravanan-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Jan 22, 2026
…net#21967)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====

Signed-off-by: Saravanan Sellappa <saravanan@nexthop.ai>
justin-oliver pushed a commit to justin-oliver/sonic-mgmt that referenced this pull request Jan 26, 2026
…net#21967)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====
ytzur1 pushed a commit to ytzur1/sonic-mgmt that referenced this pull request Feb 2, 2026
…net#21967)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====

Signed-off-by: Yael Tzur <ytzur@nvidia.com>
abhishek-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Feb 6, 2026
…net#21967)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====
Anirudh-nokia pushed a commit to Anirudh-nokia/sonic-mgmt-fork that referenced this pull request Feb 6, 2026
…net#21967)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====

Signed-off-by: ayya <anirudh.ayya@nokia.com>
lakshmi-nexthop pushed a commit to lakshmi-nexthop/sonic-mgmt that referenced this pull request Feb 11, 2026
…net#21967) (sonic-net#21969)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====

Co-authored-by: Longxiang Lyu <35479537+lolyu@users.noreply.github.com>
Signed-off-by: Lakshmi Yarramaneni <lakshmi@nexthop.ai>
nnelluri-cisco pushed a commit to nnelluri-cisco/sonic-mgmt that referenced this pull request Feb 12, 2026
…net#21967)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====

Signed-off-by: nnelluri-cisco <nnelluri@cisco.com>
rraghav-cisco pushed a commit to rraghav-cisco/sonic-mgmt that referenced this pull request Feb 13, 2026
…net#21967)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====

Signed-off-by: Raghavendran Ramanathan <rraghav@cisco.com>
anilal-amd pushed a commit to anilal-amd/anilal-forked-sonic-mgmt that referenced this pull request Feb 19, 2026
…net#21967)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====

Signed-off-by: Zhuohui Tan <zhuohui.tan@amd.com>
abhishek-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Mar 17, 2026
…net#21967)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====

Signed-off-by: Abhishek <abhishek@nexthop.ai>
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Mar 27, 2026
…net#21967)

What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:

2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false}
2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================
2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}}
2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}
2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}')
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.

This is observed on nightly that tries to upgrade to 20251110.03:

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.02
SONiC OS Version: 12
Distribution: Debian 12.12
Kernel: 6.1.0-29-2-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.11.2

admin@bjw3-can-7260-13:~$ show version | head -n 5

SONiC Software Version: SONiC.20251110.03
SONiC OS Version: 13
Distribution: Debian 13.2
Kernel: 6.12.41+deb13-sonic-amd64
admin@bjw3-can-7260-13:~$ python --version
Python 3.13.5
Signed-off-by: Longxiang lolv@microsoft.com

How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.

How did you verify/test it?
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03
2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS
2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size
2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants