Skip to content

Conversation

@hzhou
Copy link
Contributor

@hzhou hzhou commented Oct 28, 2025

Pull Request Description

The nonblocking contextid allocation algorithm currently is implemented using Sched, It requires a few hacks and it is very difficult to debug. Re-implement it using MPIX Async API instead.

NOTE: Hopefully, this will resolve the outstanding test xfails. Now that I understands the algorithm better, if we still encounter lock contention issue, we can try insert heavy yield when we know we are not getting the masks.

[skip warnings]

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@hzhou hzhou marked this pull request as draft October 29, 2025 02:32
@hzhou hzhou force-pushed the 2510_idup_nb branch 10 times, most recently from a73f4ba to 8e59515 Compare November 3, 2025 16:35
@hzhou
Copy link
Contributor Author

hzhou commented Nov 3, 2025

test:mpich/ch3/most
test:mpich/ch4/most

✔️

@hzhou hzhou requested review from raffenet and yfguo November 7, 2025 21:28
@hzhou hzhou marked this pull request as ready for review November 7, 2025 21:28
hzhou added 12 commits November 7, 2025 15:28
Since we need enter thread CS before calling async poll functions, and
we may have a recursive situation and when the callback make blocking
MPI calls and invoke MPI progress within the poll function. To allow
that, we need skip async progress when we re-entering the progress.
Re-implement the nonblocking contextid allocation algorithm using MPIX
Async.
Add internal nonblocking collective interfaces that accept an explicit
tag. This allows asynchronous algorithms to internally call nonblocking
collectives but not tied to a specific schedule framework. Specifically,
it allows nonblocking algorithms using the MPIX Async interface.
Use the nonblocking collective interface with an explicit tag to
in the nonblocking context_id allocation algorithm.
The basic general request relies on external progress mechanism to
complete the request rather than on the extension with wait_fn.

We can create generalized request using MPIX Async mechanism and
MPID_Progress_wait will complete the request.
MPIR_SCHED_KIND_GENERALIZED no longer needed.
It's easier for debugging when we can track the iteration number between
retries.
Refactor between the blocking and nonblocking algorithm to avoid
duplications and inconsistencies.

Fix the potential missed thread-safety in the nonblocking code.
Ch3 need be informed whether it can enter a blocking receive during
progress or does it need continuously poll the progress.
Re-organize code for better readability. Re-do the comments to remove
stale parts and reflect the current code.
The dynamic_sendrecv is used in MPI_Intercomm_create. The mismatching
between threads are protected by the user provided tag, thus it is okay
to yield during the blocking progress. Without the yield,
MPI_Intercomm_create may block another thread's progress when the remote
processes are not present (blocked by other communications).

In the dynamic process accept/connect path, we force peer_comm's context
id to 0.  This is okay because the leader exchange is established with a
specific pair of addresses and there is no other communications yet
during leader_exchange.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant