feat: new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade#19263
feat: new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade#19263yejianquan merged 1 commit intosonic-net:masterfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
876fd04 to
32020b1
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| MAX_WORKER: $(INSTANCE_NUMBER) | ||
| KVM_IMAGE_BRANCH: $(BUILD_BRANCH) | ||
| MGMT_BRANCH: $(BUILD_BRANCH) | ||
| COMMON_EXTRA_PARAMS: "--disable_sai_validation " |
There was a problem hiding this comment.
Temporarily disable SAI validation for now as it will not be compatible with Ubuntu 24.04 due to the usage of concurrent.futures. We will refactor the SAI validation and re-enable it later. Microsoft ADO to track the progress: 33758029
…untu 24.04 upgrade (#19599) What is the motivation for this PR? #19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test How did you do it? Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade
|
@cyw233 . all sonic-buildimage master branch PR validation failed with following message, I guess it's related with this PR: File "/var/src/sonic-mgmt/tests/common/helpers/multi_thread_utils.py", line 47, in _wrapper |
Hey @liuh-80, yeah we need to temporarily disable SAI validation for now as it will not be compatible with the change due to the usage of concurrent.futures. We will refactor the SAI validation and re-enable it later. Microsoft ADO to track the progress: 33758029 Therefore, I added the |
Why I did it Temporarily disable SAI validation for now as it is not compatible with the change in sonic-net/sonic-mgmt#19263 due to the usage of concurrent.futures. We will refactor the SAI validation and re-enable it later. Microsoft ADO to track the progress: 33758029. Work item tracking Microsoft ADO (number only): 33039693 signed-off-by: jianquanye@microsoft.com
…19263) Description of PR We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread. To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations. Summary: Fixes # (issue) Microsoft ADO 33039693 signed-off-by: jianquanye@microsoft.com
…untu 24.04 upgrade (sonic-net#19599) What is the motivation for this PR? sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test How did you do it? Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade
…19263) Description of PR We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread. To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations. Summary: Fixes # (issue) Microsoft ADO 33039693 signed-off-by: jianquanye@microsoft.com
…untu 24.04 upgrade (sonic-net#19599) What is the motivation for this PR? sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test How did you do it? Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade
…19263) Description of PR We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread. To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations. Summary: Fixes # (issue) Microsoft ADO 33039693 signed-off-by: jianquanye@microsoft.com
…untu 24.04 upgrade (sonic-net#19599) What is the motivation for this PR? sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test How did you do it? Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade
According to #19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR #21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in #19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR #21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com>
…19263) Description of PR We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread. To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations. Summary: Fixes # (issue) Microsoft ADO 33039693 signed-off-by: jianquanye@microsoft.com Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
…untu 24.04 upgrade (sonic-net#19599) What is the motivation for this PR? sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test How did you do it? Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com> Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
…19263) Description of PR We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread. To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations. Summary: Fixes # (issue) Microsoft ADO 33039693 signed-off-by: jianquanye@microsoft.com Signed-off-by: Aharon Malkin <amalkin@nvidia.com>
…untu 24.04 upgrade (sonic-net#19599) What is the motivation for this PR? sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test How did you do it? Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade Signed-off-by: Aharon Malkin <amalkin@nvidia.com>
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com> Signed-off-by: Aharon Malkin <amalkin@nvidia.com>
…19263) Description of PR We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread. To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations. Summary: Fixes # (issue) Microsoft ADO 33039693 signed-off-by: jianquanye@microsoft.com Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
…untu 24.04 upgrade (sonic-net#19599) What is the motivation for this PR? sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test How did you do it? Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com> Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
…19263) Description of PR We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread. To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations. Summary: Fixes # (issue) Microsoft ADO 33039693 signed-off-by: jianquanye@microsoft.com
…untu 24.04 upgrade (sonic-net#19599) What is the motivation for this PR? sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test How did you do it? Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com>
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com> Signed-off-by: YiFan Wang <yifan@nexthop.ai>
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com>
According to #19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR #21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in #19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR #21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com>
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com> Signed-off-by: Priyansh Tratiya <ptratiya@microsoft.com>
…19263) Description of PR We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread. To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations. Summary: Fixes # (issue) Microsoft ADO 33039693 signed-off-by: jianquanye@microsoft.com Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
…untu 24.04 upgrade (sonic-net#19599) What is the motivation for this PR? sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test How did you do it? Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade Signed-off-by: Guy Shemesh <gshemesh@nvidia.com>
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com> Signed-off-by: Lakshmi Yarramaneni <lakshmi@nexthop.ai>
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com>
…19263) Description of PR We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around fork() in multi‐threaded programs, we will start getting the ansible.errors.AnsibleError: A worker was found in a dead state exception due to ThreadPoolExecutor from concurrent.futures.thread. To mitigate this issue, we re-implemented the SafeThreadPoolExecutor class with the traditional ThreadPool from multiprocessing.pool for multithreading operations. Summary: Fixes # (issue) Microsoft ADO 33039693 signed-off-by: jianquanye@microsoft.com Signed-off-by: Yael Tzur <ytzur@nvidia.com>
…untu 24.04 upgrade (sonic-net#19599) What is the motivation for this PR? sonic-net#19263 added new SafeThreadPoolExecutor for Ubuntu 24.04 upgrade and common param in PR test, but we also need the param in baseline test How did you do it? Add common param in baseline to support SafeThreadPoolExecutor for Ubuntu 24.04 upgrade Signed-off-by: Yael Tzur <ytzur@nvidia.com>
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com> Signed-off-by: Yael Tzur <ytzur@nvidia.com>
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com>
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com> Signed-off-by: Lakshmi Yarramaneni <lakshmi@nexthop.ai>
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com> Signed-off-by: Lakshmi Yarramaneni <lakshmi@nexthop.ai>
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com> Signed-off-by: Raghavendran Ramanathan <rraghav@cisco.com>
According to sonic-net#19263, python 3.12 enforces more rigorous check around fork() in multiple-threaded programs. After the docker-sonic-mgmt image is upgraded to Ubuntu 24.04. python and ansible are upgraded too. With python 3.12 and ansible 2.18 in new docker-sonic-mgmt, the nbrhosts fixture depends on concurrent.futures may fail with error like below: ``` self = <ansible.plugins.strategy.linear.StrategyModule object at 0x7596c07986e0> iterator = <ansible.executor.play_iterator.PlayIterator object at 0x7596c09b2a80> def _wait_on_pending_results(self, iterator): ''' Wait for the shared counter to drop to zero, using a short sleep between checks to ensure we don't spin lock ''' ret_results = [] display.debug("waiting for pending results...") while self._pending_results > 0 and not self._tqm._terminated: if self._tqm.has_dead_workers(): > raise AnsibleError("A worker was found in a dead state") E ansible.errors.AnsibleError: A worker was found in a dead state ``` PR sonic-net#21407 introduced threading lock to temporarily workaround the issue. A better way to fix the issue is to use the SafeThreadPoolExecutor updated in sonic-net#19263 to initialize the `nbrhosts` objects. This change reverted the threading lock of PR sonic-net#21407 and updated the `nbrhosts` fixture to use the new SafeThreadPoolExecutor. Signed-off-by: Xin Wang <xiwang5@microsoft.com> Signed-off-by: Raghavendran Ramanathan <rraghav@cisco.com>
Description of PR
We will soon upgrade sonic-mgmt docker to Ubuntu 24.04 which comes with Python 3.12 + Ansible 2.18.6. Since Python 3.12 enforces more rigorous checks around
fork()in multi‐threaded programs, we will start getting theansible.errors.AnsibleError: A worker was found in a dead stateexception due toThreadPoolExecutorfromconcurrent.futures.thread.To mitigate this issue, we re-implemented the
SafeThreadPoolExecutorclass with the traditionalThreadPoolfrommultiprocessing.poolfor multithreading operations.Summary:
Fixes # (issue) Microsoft ADO 33039693
Type of change
Back port request
Approach
What is the motivation for this PR?
How did you do it?
How did you verify/test it?
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation