comm: reimplement nonblocking contextid allocation using MPIX Async #7648

hzhou · 2025-10-28T23:27:39Z

Pull Request Description

The nonblocking contextid allocation algorithm currently is implemented using Sched, It requires a few hacks and it is very difficult to debug. Re-implement it using MPIX Async API instead.

NOTE: Hopefully, this will resolve the outstanding test xfails. Now that I understands the algorithm better, if we still encounter lock contention issue, we can try insert heavy yield when we know we are not getting the masks.

[skip warnings]

Author Checklist

Provide Description
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits Follow Good Practice
Commits are self-contained and do not do two things at once.
Commit message is of the form: module: short description
Commit message explains what's in the commit.
Passes All Tests
Whitespace checker. Warnings test. Additional tests via comments.
Contribution Agreement
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.

hzhou · 2025-11-03T16:35:16Z

test:mpich/ch3/most
test:mpich/ch4/most

✔️

Since we need enter thread CS before calling async poll functions, and we may have a recursive situation and when the callback make blocking MPI calls and invoke MPI progress within the poll function. To allow that, we need skip async progress when we re-entering the progress.

Re-implement the nonblocking contextid allocation algorithm using MPIX Async.

Add internal nonblocking collective interfaces that accept an explicit tag. This allows asynchronous algorithms to internally call nonblocking collectives but not tied to a specific schedule framework. Specifically, it allows nonblocking algorithms using the MPIX Async interface.

Use the nonblocking collective interface with an explicit tag to in the nonblocking context_id allocation algorithm.

The basic general request relies on external progress mechanism to complete the request rather than on the extension with wait_fn. We can create generalized request using MPIX Async mechanism and MPID_Progress_wait will complete the request.

MPIR_SCHED_KIND_GENERALIZED no longer needed.

It's easier for debugging when we can track the iteration number between retries.

Refactor between the blocking and nonblocking algorithm to avoid duplications and inconsistencies. Fix the potential missed thread-safety in the nonblocking code.

Ch3 need be informed whether it can enter a blocking receive during progress or does it need continuously poll the progress.

Re-organize code for better readability. Re-do the comments to remove stale parts and reflect the current code.

The dynamic_sendrecv is used in MPI_Intercomm_create. The mismatching between threads are protected by the user provided tag, thus it is okay to yield during the blocking progress. Without the yield, MPI_Intercomm_create may block another thread's progress when the remote processes are not present (blocked by other communications). In the dynamic process accept/connect path, we force peer_comm's context id to 0. This is okay because the leader exchange is established with a specific pair of addresses and there is no other communications yet during leader_exchange.

hzhou marked this pull request as draft October 29, 2025 02:32

hzhou force-pushed the 2510_idup_nb branch 10 times, most recently from a73f4ba to 8e59515 Compare November 3, 2025 16:35

hzhou requested review from raffenet and yfguo November 7, 2025 21:28

hzhou marked this pull request as ready for review November 7, 2025 21:28

hzhou added 12 commits November 7, 2025 15:28

comm: re-implement nonblocking contextid allocation

c4d081c

Re-implement the nonblocking contextid allocation algorithm using MPIX Async.

comm/contextid: use tag nonblocking collectives

ad870b5

Use the nonblocking collective interface with an explicit tag to in the nonblocking context_id allocation algorithm.

test: remove idup xfails

6c989ed

sched: remove MPIR_SCHED_KIND_GENERALIZED

650cb8f

MPIR_SCHED_KIND_GENERALIZED no longer needed.

comm/contextid: replace first_iter with iter

0d70f66

It's easier for debugging when we can track the iteration number between retries.

comm/contextid: refactor context_id allocation code

c0f51d2

Refactor between the blocking and nonblocking algorithm to avoid duplications and inconsistencies. Fix the potential missed thread-safety in the nonblocking code.

async_things: add MPII_async_things_pending

81f24db

Ch3 need be informed whether it can enter a blocking receive during progress or does it need continuously poll the progress.

comm/context_id: organize code and better comments

495e529

Re-organize code for better readability. Re-do the comments to remove stale parts and reflect the current code.

hzhou force-pushed the 2510_idup_nb branch from 8e59515 to e9507f5 Compare November 7, 2025 21:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

comm: reimplement nonblocking contextid allocation using MPIX Async #7648

comm: reimplement nonblocking contextid allocation using MPIX Async #7648

Uh oh!

hzhou commented Oct 28, 2025 •

edited

Loading

Uh oh!

hzhou commented Nov 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

comm: reimplement nonblocking contextid allocation using MPIX Async #7648

Are you sure you want to change the base?

comm: reimplement nonblocking contextid allocation using MPIX Async #7648

Uh oh!

Conversation

hzhou commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

Author Checklist

Uh oh!

hzhou commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hzhou commented Oct 28, 2025 •

edited

Loading

hzhou commented Nov 3, 2025 •

edited

Loading