Skip to content

Fix lock order inversion in ioqueue write and connect callbacks#4806

Merged
sauwming merged 5 commits intomasterfrom
copilot/fix-lock-order-inversion-issue
Feb 19, 2026
Merged

Fix lock order inversion in ioqueue write and connect callbacks#4806
sauwming merged 5 commits intomasterfrom
copilot/fix-lock-order-inversion-issue

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 19, 2026

Description

Lock order inversion detected by TSan: ioqueue key lock (M1) acquired while holding grp_lock (M0) in one path, reversed in callback chains. Extended existing PJ_IOQUEUE_CALLBACK_NO_LOCK pattern from read callbacks to both write and connect callbacks in ioqueue_winnt.c and ioqueue_common_abs.c.

Changes:

  • Added write_callback_thread and write_cb_list to pj_ioqueue_key_t for write callback tracking and queuing in both Windows and common implementations
  • Modified write operation handling to release ioqueue key lock before invoking callbacks when concurrency disabled
  • Modified connect operation handling to release ioqueue key lock before invoking callbacks when PJ_IOQUEUE_CALLBACK_NO_LOCK is enabled
  • Created ioqueue_dispatch_write_event_no_lock() to process queued write callbacks in both implementations
  • Added write callback cleanup in cancel_all_pending_op() for Windows and key unregister for common implementations
  • Updated PJ_IOQUEUE_CALLBACK_NO_LOCK documentation to reflect extended scope (now covers read, write, and connect callbacks)
  • Removed TSan suppression entry for deadlock:stateless_send_transport_cb from tests/sanitizers/tsan.supp

Verification:

  • ioqueue_common_abs.c now applies the PJ_IOQUEUE_CALLBACK_NO_LOCK pattern to read, write, and connect callbacks
  • ioqueue_winnt.c applies the pattern to read and write callbacks; connect callbacks already release locks before invocation

Motivation and Context

Thread Sanitizer detects lock order inversion between ioqueue key lock and grp_lock during connect/write operations. This can cause deadlocks in multi-threaded scenarios. The fix applies the same re-entrancy prevention mechanism already used for read callbacks to both write and connect callbacks across all ioqueue implementations.

How Has This Been Tested?

  • Build verification on Linux x86_64
  • Basic PJSUA functional test passes
  • CodeQL security scan shows no issues
  • TSan suppression removed as the underlying issue is now fixed

Screenshots (if appropriate):

N/A

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the CODING STYLE of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
Original prompt

This section details on the original issue you should resolve

<issue_title>Lock order inversion issue in ioqueue</issue_title>
<issue_description>### Describe the bug

Lock order inversion involving ioqueue.

Mutex M1 acquired here while holding mutex M0 in main thread:
  pjsip/pjproject#5 pj_ioqueue_lock_key ../src/pj/ioqueue_common_abs.c:1502
  pjsip/pjproject#6 pj_ioqueue_connect ../src/pj/ioqueue_common_abs.c:1329
  pjsip/pjproject#7 pj_activesock_start_connect ../src/pj/activesock.c:961
  pjsip/pjproject#8 lis_create_transport ../src/pjsip/sip_transport_tcp.c:1063
  pjsip/pjproject#9 pjsip_tpmgr_acquire_transport2 ../src/pjsip/sip_transport.c:2800
  pjsip/pjproject#10 pjsip_endpt_acquire_transport2 ../src/pjsip/sip_endpoint.c:1288
  pjsip/pjproject#11 stateless_send_transport_cb ../src/pjsip/sip_util.c:1208
  pjsip/pjproject#12 stateless_send_resolver_callback ../src/pjsip/sip_util.c:1444
  pjsip/pjproject#13 pjsip_resolve ../src/pjsip/sip_resolve.c:399
  pjsip/pjproject#14 pjsip_endpt_resolve ../src/pjsip/sip_endpoint.c:1243
  pjsip/pjproject#15 pjsip_endpt_send_request_stateless ../src/pjsip/sip_util.c:1513
  pjsip/pjproject#16 tsx_send_msg ../src/pjsip/sip_transaction.c:2599
  pjsip/pjproject#17 tsx_on_state_null ../src/pjsip/sip_transaction.c:2871
  pjsip/pjproject#18 pjsip_tsx_send_msg ../src/pjsip/sip_transaction.c:2023
  pjsip/pjproject#19 perform_tsx_test ../src/test/tsx_uac_test.c:1156

Mutex M0 acquired here while holding mutex M1 in main thread:
  pjsip/pjproject#4 pj_grp_lock_acquire ../src/pj/lock.c:486
  pjsip/pjproject#5 send_msg_callback ../src/pjsip/sip_transaction.c:2096
  pjsip/pjproject#6 stateless_send_transport_cb ../src/pjsip/sip_util.c:1162
  pjsip/pjproject#7 transport_send_callback ../src/pjsip/sip_transport.c:952
  pjsip/pjproject#8 on_data_sent ../src/pjsip/sip_transport_tcp.c:1254
  pjsip/pjproject#9 tcp_flush_pending_tx ../src/pjsip/sip_transport_tcp.c:799
  pjsip/pjproject#10 on_connect_complete ../src/pjsip/sip_transport_tcp.c:1573
  pjsip/pjproject#11 ioqueue_on_connect_complete ../src/pj/activesock.c:976

One possible solution is to avoid holding the lock in ioq callback by applying a similar mechanism as in #4569 and #4790 to the connect/write callback as well.

Steps to reproduce

Run pjsip test under TSan

PJSIP version

master

Context

N/A

Log, call stack, etc

N/A
```</issue_description>

<agent_instructions>also check `ioqueue_winnt`</agent_instructions>

## Comments on the Issue (you are @copilot in this section)

<comments>
</comments>

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Fix lock order inversion issue in ioqueue Fix lock order inversion in ioqueue write callbacks Feb 19, 2026
Copilot AI requested a review from sauwming February 19, 2026 03:25
@sauwming
Copy link
Copy Markdown
Member

@copilot: modify ioqueue_common_abs as well, and remove the suppression entry in tsan.supp

…on_abs and remove tsan suppression

Co-authored-by: sauwming <[email protected]>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Feb 19, 2026

@copilot: modify ioqueue_common_abs as well, and remove the suppression entry in tsan.supp

Done. Applied the PJ_IOQUEUE_CALLBACK_NO_LOCK pattern to write operations in ioqueue_common_abs and removed the tsan suppression entry in commit 3854bb1.

@sauwming
Copy link
Copy Markdown
Member

@copilot: Do not hold ioq lock before calling on_connect_complete if PJ_IOQUEUE_CALLBACK_NO_LOCK is enabled.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Feb 19, 2026

@copilot: Do not hold ioq lock before calling on_connect_complete if PJ_IOQUEUE_CALLBACK_NO_LOCK is enabled.

Done. Modified ioqueue_common_abs.c to release the ioqueue key lock before calling on_connect_complete when PJ_IOQUEUE_CALLBACK_NO_LOCK is enabled in commit 8ddd2e5.

Copilot AI changed the title Fix lock order inversion in ioqueue write callbacks Fix lock order inversion in ioqueue write and connect callbacks Feb 19, 2026
@sauwming sauwming marked this pull request as ready for review February 19, 2026 07:26
@sauwming sauwming requested a review from nanangizz February 19, 2026 07:27
@sauwming
Copy link
Copy Markdown
Member

sauwming commented Feb 19, 2026

Using the mechanism of PJ_IOQUEUE_CALLBACK_NO_LOCK seems to work to solve the lock order inversion (which is natural because we no longer hold the lock in connect cb).

So we have three options:

  1. Use the approach in this PR
    Con: the riskiest, prone to race condition. Pro: avoid future deadlock issue involving ioq write events as well
  2. Release the lock only before calling connect cb (the write events will be left untouched)
    Pro: less risky
  3. Use suppression file (Lock order inversion issue in ioqueue #4773)
    Pro: least risky. Con: rely on suppression, which may also filter other issues

@nanangizz
Copy link
Copy Markdown
Member

1. Use the approach in this PR
   Con: the riskiest, prone to race condition. Pro: avoid future deadlock issue involving ioq write events as well

I agree this is the riskiest of the three approaches, given the rather aggressive changes involved.
However, I don't see how this is "prone to race condition", could you describe a bit more detail or an example?

@sauwming
Copy link
Copy Markdown
Member

Now connect and write cb can be called at the same time, similar to accept and read.

@sauwming sauwming merged commit 79b304f into master Feb 19, 2026
69 of 72 checks passed
@sauwming sauwming deleted the copilot/fix-lock-order-inversion-issue branch February 19, 2026 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lock order inversion issue in ioqueue

3 participants