Skip to content

Conversation

@brennanmk
Copy link
Contributor

Description

The MultiThreadedExecutor contains a possible race condition. If _spin_once_impl is called more than once, it is possible that multiple threads may end up passing the condition on line 1008 and try to remove the same future on line 1009.

This solution proposes locking self._futures. This also has the benefit of removing the need to make a shallow copy of self._futures (#1129).

Replaces #1129
Fixes #1393

Is this user-facing behavior change?

No

Did you use Generative AI?

No

Additional Information

An alternative solution might involve placing a lock to prevent multiple threads from entering _spin_once_impl.

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change looks good to me to exclude the racy condition.

IMO, the right thing to do here is to add the spinning status for the Executor to check if it is already spinning just like rclcpp::Executor. looks like Executor of rclpy does not have this status at all.

@fujitatomoya
Copy link
Collaborator

i would like to have 2nd review for this before starting CI.

@cst0
Copy link

cst0 commented Jun 25, 2025

@brennanmk Is there a minimum reproducible example for this?

@brennanmk
Copy link
Contributor Author

brennanmk commented Jun 26, 2025

@cst0 it is relatively straight forward to reproduce the issue. If you call spin_once on a MultiThreadedExecutor a bunch of times in rapid succession (in different threads) eventually it will show up.

#1393 shows what the issue looks like when it appears.

@cst0
Copy link

cst0 commented Jun 26, 2025

Ah, had some trouble but just had to be more persistent. Looks good to me!

@brennanmk
Copy link
Contributor Author

Actually, since I removed the creation of a shallow copy in the loop I think it makes sense to also lock the self._futures append call on line 1005. Otherwise it is possible that one thread appends to self._futures while another is in the loop. I do not think this would cause any observable errors, but it is still probably best practice. Alternatively I could add the creation of a shallow copy back, but this seems unnecessary.

It might make sense to do what @fujitatomoya discussed above and instead create a "spinning" variable like in rclcpp. Although this would not be quite as clean as in cpp because we would still need to add a lock to make the spinning variable atomic. It might make sense if we want to raise an exception (or warning) if _spin_once_impl is called multiple times. Happy to make this change if it would be beneficial.

@fujitatomoya
Copy link
Collaborator

Pulls: #1477
Gist: https://gist.githubusercontent.com/fujitatomoya/36b8e8c5c643de0b9572cd3eb716871a/raw/5452a7922c32aed9ca24965b37db679c3e299622/ros2.repos
BUILD args: --packages-above-and-dependencies rclpy
TEST args: --packages-above rclpy
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/16321

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@fujitatomoya
Copy link
Collaborator

@brennanmk thanks for the effort and explanation.

for adding the spinning check, probably it would be better to have consensus to implement that. besides, this PR does make sense. i say this is good to go with 2nd approval from other maintainers.

@szobov
Copy link

szobov commented Jul 16, 2025

Hey there,
Thanks a lot for the effort.

What's needed to have this change merged?
Please let me know if I can assist with the review.

@szobov
Copy link

szobov commented Sep 18, 2025

Hey @fujitatomoya
What's needed to make it merged?
We face this issue quite often in our system.
Let me know if I can help bring it into upstream in any way.

@fujitatomoya
Copy link
Collaborator

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented Sep 18, 2025

rebase

✅ Branch has been successfully rebased

@fujitatomoya
Copy link
Collaborator

@szobov waiting for the another approval from maintainers.

@fujitatomoya
Copy link
Collaborator

@mjcarroll friendly ping.

@fujitatomoya
Copy link
Collaborator

Pulls: #1477
Gist: https://gist.githubusercontent.com/fujitatomoya/41f61146dde3a7c0029ae055a946b18b/raw/5452a7922c32aed9ca24965b37db679c3e299622/ros2.repos
BUILD args: --packages-above-and-dependencies rclpy
TEST args: --packages-above rclpy
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/17022

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@ahcorde
Copy link
Contributor

ahcorde commented Oct 1, 2025

Pulls: #1477
Gist: https://gist.githubusercontent.com/ahcorde/2dd35e155d331a60943382b0048bebd2/raw/5452a7922c32aed9ca24965b37db679c3e299622/ros2.repos
BUILD args: --packages-above-and-dependencies rclpy
TEST args: --packages-above rclpy
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/17109

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@ahcorde ahcorde merged commit bcbd489 into ros2:rolling Oct 2, 2025
3 checks passed
InvincibleRMC pushed a commit to InvincibleRMC/rclpy that referenced this pull request Nov 20, 2025
InvincibleRMC pushed a commit to InvincibleRMC/rclpy that referenced this pull request Jan 12, 2026
InvincibleRMC added a commit that referenced this pull request Jan 21, 2026
)

* Fix warnings from gcc. (#1501)

Signed-off-by: Michael Carlstrom <[email protected]>

* Update type_support to use new abcs

Signed-off-by: Michael Carlstrom <[email protected]>

* Cleanup old test cases to use new automatic inference

Signed-off-by: Michael Carlstrom <[email protected]>

* Add content-filtered-topic interfaces (#1506)

Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* Added lock to protect futures for multithreaded executor (#1477)

Signed-off-by: brennanmk <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* EventsExecutor: Handle async callbacks for services and subscriptions (#1478)

Closes #1473

Signed-off-by: Brad Martin <[email protected]>
Co-authored-by: Brad Martin <[email protected]>
Co-authored-by: Alejandro Hernandez Cordero <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* add spinning state for the Executor classes. (#1510)

Signed-off-by: Tomoya.Fujita <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* Fixes Action.*_async futures never complete (#1308)

Per rclpy:1123 If two seperate client server actions are running in seperate executors the future given to the ActionClient will never complete due to a race condition
This fixes  the calls to rcl handles potentially leading to deadlock scenarios by adding locks to there references
Co-authored-by: Aditya Agarwal <[email protected]>
Co-authored-by: Jonathan Blixt <[email protected]>
Signed-off-by: Jonathan Blixt <[email protected]>

Co-authored-by: Alejandro Hernandez Cordero <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* remove unused 'param_type' (#1524)

'param_type' is set but never used

Signed-off-by: Christian Rauch <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* Changelog

Signed-off-by: Alejandro Hernandez Cordero <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* 10.0.1

Signed-off-by: Michael Carlstrom <[email protected]>

* Remove duplicate future handling from send_goal_async (#1532)

A recent change intended to move this logic into a lock context, but
actually ended up duplicating it instead. This fixes that by removing
the duplicated logic outside of the lock. It also preserves the explicit
typing annotation on the future.

Signed-off-by: Nathan Wiebe Neufeldt <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* fix(test_events_executor): destroy all nodes before shutdown (#1538)

Signed-off-by: yuanyuyuan <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* add BaseImpl

Signed-off-by: Michael Carlstrom <[email protected]>

* Add ImplT Support

Signed-off-by: Michael Carlstrom <[email protected]>

* fix changelong

Signed-off-by: Michael Carlstrom <[email protected]>

* Remove accidental tuple (#1542)

Signed-off-by: Michael Carlstrom <[email protected]>

* Allow action servers without execute callback (#1219)

Signed-off-by: Tim Clephas <[email protected]>

* add : get clients, servers info (#1307)

Signed-off-by: Minju, Lee <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* 10.0.2

Signed-off-by: Michael Carlstrom <[email protected]>

* update tests

Signed-off-by: Michael Carlstrom <[email protected]>

* ParameterEventHandler support ContentFiltering (#1531)

* ParameterEventHandler support ContentFiltering

Signed-off-by: Barry Xu <[email protected]>

* Address review comments

Signed-off-by: Barry Xu <[email protected]>

---------

Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* Fix issues with resuming async tasks awaiting a future (#1469)

Signed-off-by: Błażej Sowa <[email protected]>
Signed-off-by: Nadav Elkabets <[email protected]>
Co-authored-by: Nadav Elkabets <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* 10.0.3

Signed-off-by: Michael Carroll <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* Increase clock accuracy (#1564)

Signed-off-by: Florian Vahl <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* Use unconditional wait when possible. (#1563)

Previously the max() value of the steady time was used as the default deadline. In some environments this results in overflows in the underlying pthread_cond_timedwait call, which waits for the conditional variable in the events queue implementation. Consequently, this lead to freezes in the executor. Reducing the deadline significantly helped, but using `cv.wait` instead of `cv_.wait_until` seems to be the cleaner solution.

Signed-off-by: Florian Vahl <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* Remove default from switch with enum, so that compiler warns. (#1566)

Signed-off-by: Tomoya Fujita <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* Fix parameter parsing for unspecified target nodes (#1552)

Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* Improve the compatibility of processing YAML parameter files (#1548)

Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* Improve wildcard parsing and optimize the logic for parsing YAML para… (#1571)

Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* Expose action graph functions as Node class methods. (#1574)

* Expose action graph functions as Node class methods.

Signed-off-by: Tomoya Fujita <[email protected]>

* address review comments to keep the warning consistent.

Signed-off-by: Tomoya.Fujita <[email protected]>

---------

Signed-off-by: Tomoya Fujita <[email protected]>
Signed-off-by: Tomoya.Fujita <[email protected]>

* Fix performance bug in MultiThreadedExecutor (hopefully) (#1547)

Signed-off-by: Michael Tandy <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* Changelog

Signed-off-by: Alejandro Hernandez Cordero <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

* 10.0.4

Signed-off-by: Michael Carlstrom <[email protected]>

* use Msg over BaseMessage

Signed-off-by: Michael Carlstrom <[email protected]>

* Use Srv over BaseService

Signed-off-by: Michael Carlstrom <[email protected]>

* Use Action over BaseAction

Signed-off-by: Michael Carlstrom <[email protected]>

* lint

Signed-off-by: Michael Carlstrom <[email protected]>

* Update rclpy/rclpy/type_support.py

Co-authored-by: Christophe Bedard <[email protected]>
Signed-off-by: Michael Carlstrom <[email protected]>

---------

Signed-off-by: Michael Carlstrom <[email protected]>
Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: brennanmk <[email protected]>
Signed-off-by: Brad Martin <[email protected]>
Signed-off-by: Tomoya.Fujita <[email protected]>
Signed-off-by: Christian Rauch <[email protected]>
Signed-off-by: Alejandro Hernandez Cordero <[email protected]>
Signed-off-by: Nathan Wiebe Neufeldt <[email protected]>
Signed-off-by: yuanyuyuan <[email protected]>
Signed-off-by: Tim Clephas <[email protected]>
Signed-off-by: Minju, Lee <[email protected]>
Signed-off-by: Barry Xu <[email protected]>
Signed-off-by: Błażej Sowa <[email protected]>
Signed-off-by: Nadav Elkabets <[email protected]>
Signed-off-by: Michael Carroll <[email protected]>
Signed-off-by: Florian Vahl <[email protected]>
Signed-off-by: Florian Vahl <[email protected]>
Signed-off-by: Tomoya Fujita <[email protected]>
Signed-off-by: Tomoya.Fujita <[email protected]>
Signed-off-by: Michael Tandy <[email protected]>
Co-authored-by: Chris Lalancette <[email protected]>
Co-authored-by: Barry Xu <[email protected]>
Co-authored-by: Brennan Miller-Klugman <[email protected]>
Co-authored-by: Brad Martin <[email protected]>
Co-authored-by: Brad Martin <[email protected]>
Co-authored-by: Alejandro Hernandez Cordero <[email protected]>
Co-authored-by: Tomoya Fujita <[email protected]>
Co-authored-by: Jonathan <[email protected]>
Co-authored-by: Christian Rauch <[email protected]>
Co-authored-by: Nathan Wiebe Neufeldt <[email protected]>
Co-authored-by: Yuyuan Yuan <[email protected]>
Co-authored-by: Tim Clephas <[email protected]>
Co-authored-by: Minju, Lee <[email protected]>
Co-authored-by: Błażej Sowa <[email protected]>
Co-authored-by: Nadav Elkabets <[email protected]>
Co-authored-by: Michael Carroll <[email protected]>
Co-authored-by: Florian Vahl <[email protected]>
Co-authored-by: Michael Tandy <[email protected]>
Co-authored-by: Christophe Bedard <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition in MultiThreadedExecutor

6 participants