fix: allow SaveModel after Complete() in VowpalWabbitThreadedLearning by JohnLangford · Pull Request #4913 · VowpalWabbit/vowpal_wabbit

JohnLangford · 2026-03-17T02:37:56Z

Summary

Fixes #4911 — VowpalWabbitThreadedLearning was nearly impossible to use because SaveModel() and PerformanceStatistics had to be called before Complete(), which is the opposite of what users expect.

Root cause: Complete() immediately called syncActions.CompleteAdding(), closing the queue before the completion continuation (which actually drains it) had a chance to run. Any SaveModel() call after Complete() threw InvalidOperationException.

Changes:

Allow SaveModel/PerformanceStatistics after completion — detect post-completion state and save directly on the root VW instance. Uses TryAdd on the sync action list as a fallback for the race window between Complete() and the continuation.
Move CompleteAdding into the root completion continuation — uses a new atomic CompleteAndRemoveAll() on ConcurrentList<T>, so sync actions can be enqueued between calling Complete() and the continuation executing.
Add Flush() method — forces an AllReduce synchronization and drains pending sync actions on demand, without requiring the example count to hit a multiple of ExampleCountPerRun.

All changes are backward-compatible — existing code that calls SaveModel before Complete() continues to work.

The natural pattern now works:

await model.Complete();
await model.SaveModel(path);   // no longer throws
var stats = await model.PerformanceStatistics;  // also works

Mid-training saves also work without count alignment:

var saveTask = model.SaveModel(path);
await model.Flush();   // forces sync now
await saveTask;
// continue learning...

Test plan

TestSaveModelAfterComplete — save after Complete() produces a valid model file
TestPerformanceStatisticsAfterComplete — stats accessible after Complete()
TestFlush — Flush() triggers sync and save mid-training, learning continues after
TestSaveModelBeforeComplete — original pattern (save before complete) still works
Existing TestAllReduce continues to pass (CI)

…WabbitThreadedLearning Previously, calling SaveModel() or accessing PerformanceStatistics after Complete() threw InvalidOperationException because Complete() immediately closed the sync action queue. This forced users into a counter-intuitive pattern of enqueuing saves before signaling completion. Changes: - Move CompleteAdding from Complete() into the root completion continuation using a new atomic CompleteAndRemoveAll(), so sync actions can be enqueued between the Complete() call and the continuation executing - Make SaveModel/PerformanceStatistics detect post-completion state and operate directly on the root VW instance via TryAdd fallback - Add Flush() method to force AllReduce sync on demand without waiting for ExampleCountPerRun threshold Fixes #4911

Task.CompletedTask was introduced in .NET 5 and is not available in netstandard2.0 which is the target framework for vw.parallel.

Improve XML doc comments on VowpalWabbitThreadedLearning to explain: - Learn() enqueues and returns immediately (async dispatch, not blocking) - Typical usage flow with code example (learn, complete, save) - What Complete() guarantees (all examples learned, final allreduce done) - That SaveModel/PerformanceStatistics work synchronously after Complete Addresses feedback from #4911 about the TPL completion model being unclear.

JohnLangford added 4 commits March 16, 2026 22:37

fix: replace Task.CompletedTask with Task.FromResult for netstandard2.0

cc750e8

Task.CompletedTask was introduced in .NET 5 and is not available in netstandard2.0 which is the target framework for vw.parallel.

Merge branch 'master' into fix/threaded-learning-save-model

f6652db

JohnLangford merged commit 45d45c5 into master Mar 19, 2026
84 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: allow SaveModel after Complete() in VowpalWabbitThreadedLearning#4913

fix: allow SaveModel after Complete() in VowpalWabbitThreadedLearning#4913
JohnLangford merged 4 commits intomasterfrom
fix/threaded-learning-save-model

JohnLangford commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JohnLangford commented Mar 17, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant