Skip to content

Shutdown regressions in signal handler (from #848)#854

Draft
j-rivero wants to merge 1 commit into
mainfrom
jrivero/tsan_problems_fix_signal_handler
Draft

Shutdown regressions in signal handler (from #848)#854
j-rivero wants to merge 1 commit into
mainfrom
jrivero/tsan_problems_fix_signal_handler

Conversation

@j-rivero
Copy link
Copy Markdown
Contributor

🦟 Bug fix

Summary

While reviewing #848 one of my testing agents running TSAN was complaining about different problems with races. See full report in: https://gist.github.com/j-rivero/2132044f9d4d0083f1e116729b23c0ed. Summarized of what I understand this are real problems:

  • src/WaitHelpers.cc:61,74,91 - g_shutdownPipe is shared unsafely between first-call initialization and the signal handler; TSan reported a race between pipe() setup and handler write().

  • src/WaitHelpers.cc:104 - one signal writes one byte, so only one blocked waiter wakes. A 2-waiter probe left one thread blocked, regressing the old notify_all semantics.

  • src/WaitHelpers.cc:70-82 - the signal handler should save and restore errno, otherwise it can clobber interrupted code's error state.

I've asked the agent to create tests that demonstrate the problem failing hard.

  • Add POSIX-only waitForShutdown regression tests that fail in a normal build when one signal only wakes a single waiter,
  • Repeated signals block on the shutdown pipe
  • The signal handler clobbers errno.

In draft by now until we confirm that these are real scenarios that can happen and prepare a fix in this PR for them.

Checklist

  • Signed all commits for DCO
  • Added a screen capture or video to the PR description that demonstrates the fix (as needed)
  • Added tests
  • Updated documentation (as needed)
  • Updated migration guide (as needed)
  • Consider updating Python bindings (if the library has them)
  • codecheck passed (See contributing)
  • All tests passed (See test coverage)
  • Updated Bazel files (if adding new files). Created an issue otherwise.
  • While waiting for a review on your PR, please help review another open pull request to support the maintainers
  • Was GenAI used to generate this PR? If so, make sure to add "Generated-by" to your commits. (See this policy for more info.)

Generated-by: Claude Sonnet 4.6 + GPT-5.4 (Copilot)

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by and Generated-by messages.

Backports: If this is a backport, please use Rebase and Merge instead.

Add POSIX-only waitForShutdown regression tests that fail in a normal
build when one signal only wakes a single waiter, repeated signals block
on the shutdown pipe, or the signal handler clobbers errno.

These checks run as threadsafe death tests so they fail hard in the
regular UNIT_WaitHelpers_TEST binary without depending on TSan
diagnostics.

Generated-by: GPT-5.4 (Copilot)
Signed-off-by: Jose Luis Rivero <jrivero@honurobotics.com>
@j-rivero j-rivero requested a review from caguero as a code owner April 16, 2026 10:25
@j-rivero j-rivero marked this pull request as draft April 16, 2026 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Inbox

Development

Successfully merging this pull request may close these issues.

2 participants