Skip to content

feat: new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade#19263

Merged
yejianquan merged 1 commit intosonic-net:masterfrom
cyw233:change-to-new-multi-thread-utils
Jul 14, 2025
Merged

feat: new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade#19263
yejianquan merged 1 commit intosonic-net:masterfrom
cyw233:change-to-new-multi-thread-utils

Conversation

@cyw233
Copy link
Contributor

@cyw233 cyw233 commented Jun 30, 2025

Description of PR

We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread.

To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations.

Summary:
Fixes # (issue) Microsoft ADO 33039693

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@cyw233 cyw233 changed the title Switch to new SafeThreadPoolExecutor to prepare for Ubuntu 24.04 upgrade feat: use new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade Jun 30, 2025
@cyw233 cyw233 marked this pull request as ready for review July 11, 2025 01:31
@cyw233 cyw233 force-pushed the change-to-new-multi-thread-utils branch from 876fd04 to 32020b1 Compare July 11, 2025 11:08
@cyw233 cyw233 changed the title feat: use new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade feat: new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade Jul 11, 2025
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

MAX_WORKER: $(INSTANCE_NUMBER)
KVM_IMAGE_BRANCH: $(BUILD_BRANCH)
MGMT_BRANCH: $(BUILD_BRANCH)
COMMON_EXTRA_PARAMS: "--disable_sai_validation "
Copy link
Contributor Author

@cyw233 cyw233 Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporarily disable SAI validation for now as it will not be compatible with Ubuntu 24.04 due to the usage of concurrent.futures. We will refactor the SAI validation and re-enable it later. Microsoft ADO to track the progress: 33758029

@yejianquan yejianquan merged commit 86c73fa into sonic-net:master Jul 14, 2025
18 checks passed
StormLiangMS pushed a commit that referenced this pull request Jul 15, 2025
…untu 24.04 upgrade (#19599)

What is the motivation for this PR?
#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test

How did you do it?
Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade
@liuh-80
Copy link
Contributor

liuh-80 commented Jul 15, 2025

@cyw233 . all sonic-buildimage master branch PR validation failed with following message, I guess it's related with this PR:

File "/var/src/sonic-mgmt/tests/common/helpers/multi_thread_utils.py", line 47, in _wrapper
raise RuntimeError("Thread worker aborted: " + repr(be))
RuntimeError: Thread worker aborted: AttributeError("'ApplyResult' object has no attribute '_condition'")

@cyw233
Copy link
Contributor Author

cyw233 commented Jul 15, 2025

@cyw233 . all sonic-buildimage master branch PR validation failed with following message, I guess it's related with this PR:

File "/var/src/sonic-mgmt/tests/common/helpers/multi_thread_utils.py", line 47, in _wrapper raise RuntimeError("Thread worker aborted: " + repr(be)) RuntimeError: Thread worker aborted: AttributeError("'ApplyResult' object has no attribute '_condition'")

Hey @liuh-80, yeah we need to temporarily disable SAI validation for now as it will not be compatible with the change due to the usage of concurrent.futures. We will refactor the SAI validation and re-enable it later. Microsoft ADO to track the progress: 33758029

Therefore, I added the --disable_sai_validation option to the azure pipeline definition: sonic-net/sonic-buildimage#23346

yejianquan pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Jul 16, 2025
Why I did it
Temporarily disable SAI validation for now as it is not compatible with the change in sonic-net/sonic-mgmt#19263 due to the usage of concurrent.futures. We will refactor the SAI validation and re-enable it later. Microsoft ADO to track the progress: 33758029.

Work item tracking
Microsoft ADO (number only): 33039693

signed-off-by: jianquanye@microsoft.com
nissampa pushed a commit to nissampa/sonic-mgmt_dpu_test that referenced this pull request Aug 7, 2025
…19263)

Description of PR
We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread.

To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations.

Summary:
Fixes # (issue) Microsoft ADO 33039693

signed-off-by: jianquanye@microsoft.com
nissampa pushed a commit to nissampa/sonic-mgmt_dpu_test that referenced this pull request Aug 7, 2025
…untu 24.04 upgrade (sonic-net#19599)

What is the motivation for this PR?
sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test

How did you do it?
Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade
ashutosh-agrawal pushed a commit to ashutosh-agrawal/sonic-mgmt that referenced this pull request Aug 14, 2025
…19263)

Description of PR
We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread.

To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations.

Summary:
Fixes # (issue) Microsoft ADO 33039693

signed-off-by: jianquanye@microsoft.com
ashutosh-agrawal pushed a commit to ashutosh-agrawal/sonic-mgmt that referenced this pull request Aug 14, 2025
…untu 24.04 upgrade (sonic-net#19599)

What is the motivation for this PR?
sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test

How did you do it?
Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade
vidyac86 pushed a commit to vidyac86/sonic-mgmt that referenced this pull request Oct 23, 2025
…19263)

Description of PR
We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread.

To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations.

Summary:
Fixes # (issue) Microsoft ADO 33039693

signed-off-by: jianquanye@microsoft.com
vidyac86 pushed a commit to vidyac86/sonic-mgmt that referenced this pull request Oct 23, 2025
…untu 24.04 upgrade (sonic-net#19599)

What is the motivation for this PR?
sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test

How did you do it?
Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade
mssonicbld pushed a commit that referenced this pull request Dec 16, 2025
According to #19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR #21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in #19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR #21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 16, 2025
…19263)

Description of PR
We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread.

To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations.

Summary:
Fixes # (issue) Microsoft ADO 33039693

signed-off-by: jianquanye@microsoft.com
Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 16, 2025
…untu 24.04 upgrade (sonic-net#19599)

What is the motivation for this PR?
sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test

How did you do it?
Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade

Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 16, 2025
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Dec 16, 2025
…19263)

Description of PR
We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread.

To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations.

Summary:
Fixes # (issue) Microsoft ADO 33039693

signed-off-by: jianquanye@microsoft.com
Signed-off-by: Aharon Malkin <amalkin@nvidia.com>
AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Dec 16, 2025
…untu 24.04 upgrade (sonic-net#19599)

What is the motivation for this PR?
sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test

How did you do it?
Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade

Signed-off-by: Aharon Malkin <amalkin@nvidia.com>
AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Dec 16, 2025
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Signed-off-by: Aharon Malkin <amalkin@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 21, 2025
…19263)

Description of PR
We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread.

To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations.

Summary:
Fixes # (issue) Microsoft ADO 33039693

signed-off-by: jianquanye@microsoft.com
Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 21, 2025
…untu 24.04 upgrade (sonic-net#19599)

What is the motivation for this PR?
sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test

How did you do it?
Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade

Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 21, 2025
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Jan 13, 2026
…19263)

Description of PR
We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread.

To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations.

Summary:
Fixes # (issue) Microsoft ADO 33039693

signed-off-by: jianquanye@microsoft.com
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Jan 13, 2026
…untu 24.04 upgrade (sonic-net#19599)

What is the motivation for this PR?
sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test

How did you do it?
Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Jan 13, 2026
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
yifan-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Jan 14, 2026
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Signed-off-by: YiFan Wang <yifan@nexthop.ai>
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Jan 20, 2026
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
mssonicbld pushed a commit that referenced this pull request Jan 20, 2026
According to #19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR #21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in #19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR #21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
PriyanshTratiya pushed a commit to PriyanshTratiya/sonic-mgmt that referenced this pull request Jan 21, 2026
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Jan 26, 2026
…19263)

Description of PR
We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread.

To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations.

Summary:
Fixes # (issue) Microsoft ADO 33039693

signed-off-by: jianquanye@microsoft.com
Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Jan 26, 2026
…untu 24.04 upgrade (sonic-net#19599)

What is the motivation for this PR?
sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test

How did you do it?
Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade

Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
lakshmi-nexthop pushed a commit to lakshmi-nexthop/sonic-mgmt that referenced this pull request Jan 28, 2026
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Signed-off-by: Lakshmi Yarramaneni <lakshmi@nexthop.ai>
ytzur1 pushed a commit to ytzur1/sonic-mgmt that referenced this pull request Jan 29, 2026
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
ytzur1 pushed a commit to ytzur1/sonic-mgmt that referenced this pull request Feb 2, 2026
…19263)

Description of PR
We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread.

To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations.

Summary:
Fixes # (issue) Microsoft ADO 33039693

signed-off-by: jianquanye@microsoft.com
Signed-off-by: Yael Tzur <ytzur@nvidia.com>
ytzur1 pushed a commit to ytzur1/sonic-mgmt that referenced this pull request Feb 2, 2026
…untu 24.04 upgrade (sonic-net#19599)

What is the motivation for this PR?
sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test

How did you do it?
Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade

Signed-off-by: Yael Tzur <ytzur@nvidia.com>
ytzur1 pushed a commit to ytzur1/sonic-mgmt that referenced this pull request Feb 2, 2026
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Signed-off-by: Yael Tzur <ytzur@nvidia.com>
abhishek-nexthop pushed a commit to nexthop-ai/sonic-mgmt that referenced this pull request Feb 6, 2026
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
lakshmi-nexthop pushed a commit to lakshmi-nexthop/sonic-mgmt that referenced this pull request Feb 11, 2026
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Signed-off-by: Lakshmi Yarramaneni <lakshmi@nexthop.ai>
lakshmi-nexthop pushed a commit to lakshmi-nexthop/sonic-mgmt that referenced this pull request Feb 11, 2026
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Signed-off-by: Lakshmi Yarramaneni <lakshmi@nexthop.ai>
rraghav-cisco pushed a commit to rraghav-cisco/sonic-mgmt that referenced this pull request Feb 13, 2026
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Signed-off-by: Raghavendran Ramanathan <rraghav@cisco.com>
rraghav-cisco pushed a commit to rraghav-cisco/sonic-mgmt that referenced this pull request Feb 18, 2026
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs.
After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below:
```
self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0>
iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80>

    def _wait_on_pending_results(self, iterator):
        '''
        Wait for the shared counter to drop to zero, using a short sleep
        between checks to ensure we don't spin lock
        '''

        ret_results = []

        display.debug("waiting for pending results...")
        while self._pending_results > 0 and not self._tqm._terminated:

            if self._tqm.has_dead_workers():
>               raise AnsibleError("A worker was found in a dead state")
E               ansible.errors.AnsibleError: A worker was found in a dead state
```

PR sonic-net#21407 introduced threading lock to temporarily workaround the issue.

A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects.

This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Signed-off-by: Raghavendran Ramanathan <rraghav@cisco.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants