[WebGPU] Fix wait logic for inflight jobs by nikhilJain17 · Pull Request #20096 · ggml-org/llama.cpp

nikhilJain17 · 2026-03-04T03:11:12Z

Fix WebGPU wait logic incorrectly removing futures. WaitAny returns when any future completes, but the previous implementation erased the entire submission entry (aka a vector of futures). Flatten the nested futures structure to a single vector and remove only the futures that are completed.

…and fix wait to delete only the future that is completed.

nikhilJain17 · 2026-03-04T05:15:50Z

ggml/src/ggml-webgpu/ggml-webgpu.cpp

+            }
+        }
+    } else {
+        // Poll once and return


Is this the intended behavior when block = false btw? Since I think calling WaitAny with a timeout of 0 just checks once and then returns.

yep, the idea is to just check when block=false, in case the implementation isn't good at scheduling callbacks on its own.

nikhilJain17 · 2026-03-04T05:27:43Z

ggml/src/ggml-webgpu/ggml-webgpu.cpp

-        ctx->instance.WaitAny(futures[0].futures.size(), futures[0].futures.data(), UINT64_MAX);
-        futures.erase(futures.begin());


This would previously wait for any future within the first vector of futures to finish, and then delete the entire first vector (instead of just the completed future) since futures is a vector of vectors of futures. I think this bug surfaced with the param buf diff because by expanding the parameter buffer, we can have multiple futures in flight instead of just 1, so we may delete an inflight future alongside a completed future. If something was waiting for a deleted future, it would then wait forever, causing test-thread-safety to time out.

this condition isn't doing quite what we want anymore, since there is now no separation between param_bufs/set_row_error_bufs/gpu_profile bufs from different batch submissions. But, I have a PR coming up soon which should simplify this further and I think I can split it out into making sure we free enough param bufs for future batches. So this is fine to merge for now.

reeselevine · 2026-03-04T19:53:25Z

ggml/src/ggml-webgpu/ggml-webgpu.cpp

-        ctx->instance.WaitAny(futures[0].futures.size(), futures[0].futures.data(), UINT64_MAX);
-        futures.erase(futures.begin());


this condition isn't doing quite what we want anymore, since there is now no separation between param_bufs/set_row_error_bufs/gpu_profile bufs from different batch submissions. But, I have a PR coming up soon which should simplify this further and I think I can split it out into making sure we free enough param bufs for future batches. So this is fine to merge for now.

reeselevine · 2026-03-04T19:54:05Z

ggml/src/ggml-webgpu/ggml-webgpu.cpp

+            }
+        }
+    } else {
+        // Poll once and return


yep, the idea is to just check when block=false, in case the implementation isn't good at scheduling callbacks on its own.

* Enable tmate debugging for investigating thread safety issue * Refactor wait and submit to operate on vector<wgpu::FutureWaitInfo>, and fix wait to delete only the future that is completed. * Cleanup * Remove clear change and run clang-format * Cleanup

nikhilJain17 added 7 commits January 30, 2026 12:15

Merge

df60497

Merge

5ae7583

merge

70238e0

Merge with master

9f9d064

Merge

e93b40d

Enable tmate debugging for investigating thread safety issue

b3c5f0f

Refactor wait and submit to operate on vector<wgpu::FutureWaitInfo>, …

c5233a0

…and fix wait to delete only the future that is completed.

github-actions bot added devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning labels Mar 4, 2026

nikhilJain17 added 3 commits March 3, 2026 19:14

Cleanup

a0c0e84

Remove clear change and run clang-format

8b17633

Cleanup

71fc6f8

nikhilJain17 commented Mar 4, 2026

View reviewed changes

nikhilJain17 marked this pull request as ready for review March 4, 2026 05:28

nikhilJain17 requested a review from reeselevine as a code owner March 4, 2026 05:28

reeselevine approved these changes Mar 4, 2026

View reviewed changes

reeselevine merged commit 24d2ee0 into ggml-org:master Mar 4, 2026
78 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WebGPU] Fix wait logic for inflight jobs#20096

[WebGPU] Fix wait logic for inflight jobs#20096
reeselevine merged 10 commits intoggml-org:masterfrom
nikhilJain17:nikhilJain17/fix-test-thread-safety

nikhilJain17 commented Mar 4, 2026 •

edited

Loading

Uh oh!

nikhilJain17 Mar 4, 2026

Uh oh!

reeselevine Mar 4, 2026

Uh oh!

nikhilJain17 Mar 4, 2026

Uh oh!

reeselevine Mar 4, 2026

Uh oh!

reeselevine Mar 4, 2026

Uh oh!

reeselevine Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		ctx->instance.WaitAny(futures[0].futures.size(), futures[0].futures.data(), UINT64_MAX);
		futures.erase(futures.begin());

Conversation

nikhilJain17 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikhilJain17 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

reeselevine Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

nikhilJain17 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

reeselevine Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

reeselevine Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

reeselevine Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nikhilJain17 commented Mar 4, 2026 •

edited

Loading