Skip to content

[WebGPU] Fix wait logic for inflight jobs#20096

Merged
reeselevine merged 10 commits intoggml-org:masterfrom
nikhilJain17:nikhilJain17/fix-test-thread-safety
Mar 4, 2026
Merged

[WebGPU] Fix wait logic for inflight jobs#20096
reeselevine merged 10 commits intoggml-org:masterfrom
nikhilJain17:nikhilJain17/fix-test-thread-safety

Conversation

@nikhilJain17
Copy link
Contributor

@nikhilJain17 nikhilJain17 commented Mar 4, 2026

Fix WebGPU wait logic incorrectly removing futures. WaitAny returns when any future completes, but the previous implementation erased the entire submission entry (aka a vector of futures). Flatten the nested futures structure to a single vector and remove only the futures that are completed.

@github-actions github-actions bot added devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning labels Mar 4, 2026
}
}
} else {
// Poll once and return
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the intended behavior when block = false btw? Since I think calling WaitAny with a timeout of 0 just checks once and then returns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, the idea is to just check when block=false, in case the implementation isn't good at scheduling callbacks on its own.

Comment on lines -474 to -475
ctx->instance.WaitAny(futures[0].futures.size(), futures[0].futures.data(), UINT64_MAX);
futures.erase(futures.begin());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would previously wait for any future within the first vector of futures to finish, and then delete the entire first vector (instead of just the completed future) since futures is a vector of vectors of futures. I think this bug surfaced with the param buf diff because by expanding the parameter buffer, we can have multiple futures in flight instead of just 1, so we may delete an inflight future alongside a completed future. If something was waiting for a deleted future, it would then wait forever, causing test-thread-safety to time out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this condition isn't doing quite what we want anymore, since there is now no separation between param_bufs/set_row_error_bufs/gpu_profile bufs from different batch submissions. But, I have a PR coming up soon which should simplify this further and I think I can split it out into making sure we free enough param bufs for future batches. So this is fine to merge for now.

@nikhilJain17 nikhilJain17 marked this pull request as ready for review March 4, 2026 05:28
Comment on lines -474 to -475
ctx->instance.WaitAny(futures[0].futures.size(), futures[0].futures.data(), UINT64_MAX);
futures.erase(futures.begin());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this condition isn't doing quite what we want anymore, since there is now no separation between param_bufs/set_row_error_bufs/gpu_profile bufs from different batch submissions. But, I have a PR coming up soon which should simplify this further and I think I can split it out into making sure we free enough param bufs for future batches. So this is fine to merge for now.

}
}
} else {
// Poll once and return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, the idea is to just check when block=false, in case the implementation isn't good at scheduling callbacks on its own.

@reeselevine reeselevine merged commit 24d2ee0 into ggml-org:master Mar 4, 2026
78 checks passed
Ethan-a2 pushed a commit to Ethan-a2/llama.cpp that referenced this pull request Mar 20, 2026
* Enable tmate debugging for investigating thread safety issue

* Refactor wait and submit to operate on vector<wgpu::FutureWaitInfo>, and fix wait to delete only the future that is completed.

* Cleanup

* Remove clear change and run clang-format

* Cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants