[upgrade_image] Fix python interpreter mismatch after upgrade#21967
Merged
StormLiangMS merged 1 commit intosonic-net:masterfrom Jan 19, 2026
Merged
[upgrade_image] Fix python interpreter mismatch after upgrade#21967StormLiangMS merged 1 commit intosonic-net:masterfrom
StormLiangMS merged 1 commit intosonic-net:masterfrom
Conversation
Signed-off-by: Longxiang <lolv@microsoft.com>
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
wangxin
approved these changes
Jan 19, 2026
lizhijianrd
approved these changes
Jan 19, 2026
mssonicbld
pushed a commit
to mssonicbld/sonic-mgmt
that referenced
this pull request
Jan 19, 2026
…net#21967) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang lolv@microsoft.com How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====
Collaborator
|
Cherry-pick PR to 202511: #21969 |
12 tasks
PriyanshTratiya
pushed a commit
to PriyanshTratiya/sonic-mgmt
that referenced
this pull request
Jan 21, 2026
…net#21967) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang lolv@microsoft.com How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE ===== Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
vmittal-msft
pushed a commit
that referenced
this pull request
Jan 22, 2026
#21969) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang lolv@microsoft.com How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE ===== Co-authored-by: Longxiang Lyu <35479537+lolyu@users.noreply.github.com>
saravanan-nexthop
pushed a commit
to nexthop-ai/sonic-mgmt
that referenced
this pull request
Jan 22, 2026
…net#21967) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang lolv@microsoft.com How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE ===== Signed-off-by: Saravanan Sellappa <saravanan@nexthop.ai>
12 tasks
justin-oliver
pushed a commit
to justin-oliver/sonic-mgmt
that referenced
this pull request
Jan 26, 2026
…net#21967) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang lolv@microsoft.com How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====
ytzur1
pushed a commit
to ytzur1/sonic-mgmt
that referenced
this pull request
Feb 2, 2026
…net#21967) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang lolv@microsoft.com How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE ===== Signed-off-by: Yael Tzur <ytzur@nvidia.com>
abhishek-nexthop
pushed a commit
to nexthop-ai/sonic-mgmt
that referenced
this pull request
Feb 6, 2026
…net#21967) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang lolv@microsoft.com How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====
Anirudh-nokia
pushed a commit
to Anirudh-nokia/sonic-mgmt-fork
that referenced
this pull request
Feb 6, 2026
…net#21967) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang lolv@microsoft.com How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE ===== Signed-off-by: ayya <anirudh.ayya@nokia.com>
lakshmi-nexthop
pushed a commit
to lakshmi-nexthop/sonic-mgmt
that referenced
this pull request
Feb 11, 2026
…net#21967) (sonic-net#21969) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang lolv@microsoft.com How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE ===== Co-authored-by: Longxiang Lyu <35479537+lolyu@users.noreply.github.com> Signed-off-by: Lakshmi Yarramaneni <lakshmi@nexthop.ai>
nnelluri-cisco
pushed a commit
to nnelluri-cisco/sonic-mgmt
that referenced
this pull request
Feb 12, 2026
…net#21967) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang lolv@microsoft.com How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE ===== Signed-off-by: nnelluri-cisco <nnelluri@cisco.com>
rraghav-cisco
pushed a commit
to rraghav-cisco/sonic-mgmt
that referenced
this pull request
Feb 13, 2026
…net#21967) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang lolv@microsoft.com How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE ===== Signed-off-by: Raghavendran Ramanathan <rraghav@cisco.com>
anilal-amd
pushed a commit
to anilal-amd/anilal-forked-sonic-mgmt
that referenced
this pull request
Feb 19, 2026
…net#21967) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang lolv@microsoft.com How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE ===== Signed-off-by: Zhuohui Tan <zhuohui.tan@amd.com>
abhishek-nexthop
pushed a commit
to nexthop-ai/sonic-mgmt
that referenced
this pull request
Mar 17, 2026
…net#21967) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang lolv@microsoft.com How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE ===== Signed-off-by: Abhishek <abhishek@nexthop.ai>
venu-nexthop
pushed a commit
to venu-nexthop/sonic-mgmt
that referenced
this pull request
Mar 27, 2026
…net#21967) What is the motivation for this PR? 202511 nightly upgrade has the following image upgrade issue: 2026-01-19 04:47:52,053 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#237: "localhost" -> AnsibleModule::pause | Results =>{"hostname": "localhost", "reachable": true, "failed": false, "changed": false, "rc": 0, "stderr": "", "stdout": "Paused for 60.0 seconds", "start": "2026-01-19 04:46:52.047977", "stop": "2026-01-19 04:47:52.049217", "delta": 60, "echo": true, "user_input": "", "_ansible_no_log": false} 2026-01-19 04:47:52,054 ansible_hosts.py#426 DEBUG - ===== ['bjw3-can-7260-13', 'bjw3-can-7260-14'] -> shell ================================================================ 2026-01-19 04:47:52,054 ansible_hosts.py#444 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell, {"module_name": "shell", "args": ["sed -i \"s/^ClientAliveInterval [0-9].*/ClientAliveInterval 900/g\" /etc/ssh/sshd_config && systemctl restart sshd"], "kwargs": {}, "module_attrs": {"become": true}} 2026-01-19 04:47:53,211 ansible_hosts.py#502 DEBUG - /var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::shell | Results =>{"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.70' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added '10.150.238.72' (RSA) to the list of known hosts.\r\nDebian GNU/Linux 13 \\n \\l\n\n/bin/sh: 1: /usr/bin/python3.11: not found\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}} 2026-01-19 04:47:53,212 sonic.py#265 ERROR - Post upgrade actions failed, devices: ['bjw3-can-7260-13', 'bjw3-can-7260-14'], error: RunAnsibleModuleFailed('/var/src/sonic-mgmt_testbed-bjw3-can-dual-t0-7260-1/ansible/devutil/devices/sonic.py::post_upgrade_actions#242: ["bjw3-can-7260-13", "bjw3-can-7260-14"] -> AnsibleModule::"shell" failed, Results => {"bjw3-can-7260-13": {"hostname": "bjw3-can-7260-13", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.70\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}, "bjw3-can-7260-14": {"hostname": "bjw3-can-7260-14", "reachable": true, "failed": true, "module_stdout": "", "module_stderr": "Warning: Permanently added \'10.150.238.72\' (RSA) to the list of known hosts.\\r\\nDebian GNU/Linux 13 \\\\n \\\\l\\n\\n/bin/sh: 1: /usr/bin/python3.11: not found\\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\\nSee stdout/stderr for the exact error", "rc": 127, "_ansible_no_log": false, "changed": false}}') The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image. This is observed on nightly that tries to upgrade to 20251110.03: admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.02 SONiC OS Version: 12 Distribution: Debian 12.12 Kernel: 6.1.0-29-2-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.11.2 admin@bjw3-can-7260-13:~$ show version | head -n 5 SONiC Software Version: SONiC.20251110.03 SONiC OS Version: 13 Distribution: Debian 13.2 Kernel: 6.12.41+deb13-sonic-amd64 admin@bjw3-can-7260-13:~$ python --version Python 3.13.5 Signed-off-by: Longxiang lolv@microsoft.com How did you do it? Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image. How did you verify/test it? 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-13 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#187 INFO - SONiC host bjw3-can-7260-14 current version 20251110.03 2026-01-19 06:29:24,312 upgrade_image.py#202 INFO - Skip enabling FIPS 2026-01-19 06:29:24,313 upgrade_image.py#220 INFO - Use default docker folder size 2026-01-19 06:29:24,313 upgrade_image.py#232 INFO - ===== UPGRADE IMAGE DONE =====
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
Summary:
Fixes # (issue)
Type of change
Back port request
Approach
What is the motivation for this PR?
202511 nightly upgrade has the following image upgrade issue:
The issue is due to the PREV image and the upgrade-to image have different python interpreter versions. When Ansible runs first time on a target device, it will cache the python interpreter path in the memory; the error arises when the device boots up with the upgrade-to image and Ansible fails to find the python interpreter using the path that is from the PREV image.
This is observed on nightly that tries to upgrade to
20251110.03:Signed-off-by: Longxiang lolv@microsoft.com
How did you do it?
Let's reset the facts cache in the postupgrade before running any Ansible modules on the new image.
How did you verify/test it?
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation