-
Notifications
You must be signed in to change notification settings - Fork 269
Fix issue with MultiThreadedExecutor by adding lock to protect _futures #1477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fujitatomoya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the change looks good to me to exclude the racy condition.
IMO, the right thing to do here is to add the spinning status for the Executor to check if it is already spinning just like rclcpp::Executor. looks like Executor of rclpy does not have this status at all.
|
i would like to have 2nd review for this before starting CI. |
|
@brennanmk Is there a minimum reproducible example for this? |
|
Ah, had some trouble but just had to be more persistent. Looks good to me! |
|
Actually, since I removed the creation of a shallow copy in the loop I think it makes sense to also lock the self._futures append call on line 1005. Otherwise it is possible that one thread appends to self._futures while another is in the loop. I do not think this would cause any observable errors, but it is still probably best practice. Alternatively I could add the creation of a shallow copy back, but this seems unnecessary. It might make sense to do what @fujitatomoya discussed above and instead create a "spinning" variable like in rclcpp. Although this would not be quite as clean as in cpp because we would still need to add a lock to make the spinning variable atomic. It might make sense if we want to raise an exception (or warning) if _spin_once_impl is called multiple times. Happy to make this change if it would be beneficial. |
|
Pulls: #1477 |
|
@brennanmk thanks for the effort and explanation. for adding the spinning check, probably it would be better to have consensus to implement that. besides, this PR does make sense. i say this is good to go with 2nd approval from other maintainers. |
|
Hey there, What's needed to have this change merged? |
|
Hey @fujitatomoya |
|
@Mergifyio rebase |
Signed-off-by: brennanmk <[email protected]>
✅ Branch has been successfully rebased |
34e4063 to
699452f
Compare
|
@szobov waiting for the another approval from maintainers. |
|
@mjcarroll friendly ping. |
|
Pulls: #1477 |
|
Pulls: #1477 |
Signed-off-by: brennanmk <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]>
Signed-off-by: brennanmk <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]>
) * Fix warnings from gcc. (#1501) Signed-off-by: Michael Carlstrom <[email protected]> * Update type_support to use new abcs Signed-off-by: Michael Carlstrom <[email protected]> * Cleanup old test cases to use new automatic inference Signed-off-by: Michael Carlstrom <[email protected]> * Add content-filtered-topic interfaces (#1506) Signed-off-by: Barry Xu <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * Added lock to protect futures for multithreaded executor (#1477) Signed-off-by: brennanmk <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * EventsExecutor: Handle async callbacks for services and subscriptions (#1478) Closes #1473 Signed-off-by: Brad Martin <[email protected]> Co-authored-by: Brad Martin <[email protected]> Co-authored-by: Alejandro Hernandez Cordero <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * add spinning state for the Executor classes. (#1510) Signed-off-by: Tomoya.Fujita <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * Fixes Action.*_async futures never complete (#1308) Per rclpy:1123 If two seperate client server actions are running in seperate executors the future given to the ActionClient will never complete due to a race condition This fixes the calls to rcl handles potentially leading to deadlock scenarios by adding locks to there references Co-authored-by: Aditya Agarwal <[email protected]> Co-authored-by: Jonathan Blixt <[email protected]> Signed-off-by: Jonathan Blixt <[email protected]> Co-authored-by: Alejandro Hernandez Cordero <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * remove unused 'param_type' (#1524) 'param_type' is set but never used Signed-off-by: Christian Rauch <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * Changelog Signed-off-by: Alejandro Hernandez Cordero <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * 10.0.1 Signed-off-by: Michael Carlstrom <[email protected]> * Remove duplicate future handling from send_goal_async (#1532) A recent change intended to move this logic into a lock context, but actually ended up duplicating it instead. This fixes that by removing the duplicated logic outside of the lock. It also preserves the explicit typing annotation on the future. Signed-off-by: Nathan Wiebe Neufeldt <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * fix(test_events_executor): destroy all nodes before shutdown (#1538) Signed-off-by: yuanyuyuan <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * add BaseImpl Signed-off-by: Michael Carlstrom <[email protected]> * Add ImplT Support Signed-off-by: Michael Carlstrom <[email protected]> * fix changelong Signed-off-by: Michael Carlstrom <[email protected]> * Remove accidental tuple (#1542) Signed-off-by: Michael Carlstrom <[email protected]> * Allow action servers without execute callback (#1219) Signed-off-by: Tim Clephas <[email protected]> * add : get clients, servers info (#1307) Signed-off-by: Minju, Lee <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * 10.0.2 Signed-off-by: Michael Carlstrom <[email protected]> * update tests Signed-off-by: Michael Carlstrom <[email protected]> * ParameterEventHandler support ContentFiltering (#1531) * ParameterEventHandler support ContentFiltering Signed-off-by: Barry Xu <[email protected]> * Address review comments Signed-off-by: Barry Xu <[email protected]> --------- Signed-off-by: Barry Xu <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * Fix issues with resuming async tasks awaiting a future (#1469) Signed-off-by: Błażej Sowa <[email protected]> Signed-off-by: Nadav Elkabets <[email protected]> Co-authored-by: Nadav Elkabets <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * 10.0.3 Signed-off-by: Michael Carroll <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * Increase clock accuracy (#1564) Signed-off-by: Florian Vahl <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * Use unconditional wait when possible. (#1563) Previously the max() value of the steady time was used as the default deadline. In some environments this results in overflows in the underlying pthread_cond_timedwait call, which waits for the conditional variable in the events queue implementation. Consequently, this lead to freezes in the executor. Reducing the deadline significantly helped, but using `cv.wait` instead of `cv_.wait_until` seems to be the cleaner solution. Signed-off-by: Florian Vahl <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * Remove default from switch with enum, so that compiler warns. (#1566) Signed-off-by: Tomoya Fujita <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * Fix parameter parsing for unspecified target nodes (#1552) Signed-off-by: Barry Xu <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * Improve the compatibility of processing YAML parameter files (#1548) Signed-off-by: Barry Xu <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * Improve wildcard parsing and optimize the logic for parsing YAML para… (#1571) Signed-off-by: Barry Xu <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * Expose action graph functions as Node class methods. (#1574) * Expose action graph functions as Node class methods. Signed-off-by: Tomoya Fujita <[email protected]> * address review comments to keep the warning consistent. Signed-off-by: Tomoya.Fujita <[email protected]> --------- Signed-off-by: Tomoya Fujita <[email protected]> Signed-off-by: Tomoya.Fujita <[email protected]> * Fix performance bug in MultiThreadedExecutor (hopefully) (#1547) Signed-off-by: Michael Tandy <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * Changelog Signed-off-by: Alejandro Hernandez Cordero <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> * 10.0.4 Signed-off-by: Michael Carlstrom <[email protected]> * use Msg over BaseMessage Signed-off-by: Michael Carlstrom <[email protected]> * Use Srv over BaseService Signed-off-by: Michael Carlstrom <[email protected]> * Use Action over BaseAction Signed-off-by: Michael Carlstrom <[email protected]> * lint Signed-off-by: Michael Carlstrom <[email protected]> * Update rclpy/rclpy/type_support.py Co-authored-by: Christophe Bedard <[email protected]> Signed-off-by: Michael Carlstrom <[email protected]> --------- Signed-off-by: Michael Carlstrom <[email protected]> Signed-off-by: Barry Xu <[email protected]> Signed-off-by: brennanmk <[email protected]> Signed-off-by: Brad Martin <[email protected]> Signed-off-by: Tomoya.Fujita <[email protected]> Signed-off-by: Christian Rauch <[email protected]> Signed-off-by: Alejandro Hernandez Cordero <[email protected]> Signed-off-by: Nathan Wiebe Neufeldt <[email protected]> Signed-off-by: yuanyuyuan <[email protected]> Signed-off-by: Tim Clephas <[email protected]> Signed-off-by: Minju, Lee <[email protected]> Signed-off-by: Barry Xu <[email protected]> Signed-off-by: Błażej Sowa <[email protected]> Signed-off-by: Nadav Elkabets <[email protected]> Signed-off-by: Michael Carroll <[email protected]> Signed-off-by: Florian Vahl <[email protected]> Signed-off-by: Florian Vahl <[email protected]> Signed-off-by: Tomoya Fujita <[email protected]> Signed-off-by: Tomoya.Fujita <[email protected]> Signed-off-by: Michael Tandy <[email protected]> Co-authored-by: Chris Lalancette <[email protected]> Co-authored-by: Barry Xu <[email protected]> Co-authored-by: Brennan Miller-Klugman <[email protected]> Co-authored-by: Brad Martin <[email protected]> Co-authored-by: Brad Martin <[email protected]> Co-authored-by: Alejandro Hernandez Cordero <[email protected]> Co-authored-by: Tomoya Fujita <[email protected]> Co-authored-by: Jonathan <[email protected]> Co-authored-by: Christian Rauch <[email protected]> Co-authored-by: Nathan Wiebe Neufeldt <[email protected]> Co-authored-by: Yuyuan Yuan <[email protected]> Co-authored-by: Tim Clephas <[email protected]> Co-authored-by: Minju, Lee <[email protected]> Co-authored-by: Błażej Sowa <[email protected]> Co-authored-by: Nadav Elkabets <[email protected]> Co-authored-by: Michael Carroll <[email protected]> Co-authored-by: Florian Vahl <[email protected]> Co-authored-by: Michael Tandy <[email protected]> Co-authored-by: Christophe Bedard <[email protected]>
Description
The MultiThreadedExecutor contains a possible race condition. If _spin_once_impl is called more than once, it is possible that multiple threads may end up passing the condition on line 1008 and try to remove the same future on line 1009.
This solution proposes locking self._futures. This also has the benefit of removing the need to make a shallow copy of self._futures (#1129).
Replaces #1129
Fixes #1393
Is this user-facing behavior change?
No
Did you use Generative AI?
No
Additional Information
An alternative solution might involve placing a lock to prevent multiple threads from entering _spin_once_impl.