fix(scheduler): treat Error state sequences as finished in PagedAttention#2111
Draft
glaziermag wants to merge 1 commit into
Draft
fix(scheduler): treat Error state sequences as finished in PagedAttention#2111glaziermag wants to merge 1 commit into
glaziermag wants to merge 1 commit into
Conversation
Code Metrics Report━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Language Files Lines Code Comments Blanks ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ C Header 5 305 210 52 43 CSS 3 281 252 5 24 CUDA 59 17661 13824 1637 2200 Dockerfile 1 38 21 8 9 HTML 2 27 27 0 0 JavaScript 3 392 387 2 3 Jinja2 7 694 656 5 33 JSON 25 9346 9343 0 3 Makefile 1 6 5 0 1 MDX 1 147 0 132 15 Metal Shading Lan| 31 11647 9007 1064 1576 PowerShell 1 300 227 30 43 Python 129 9969 8194 456 1319 Shell 2 489 331 96 62 Plain Text 3 3723 0 2413 1310 TOML 27 1309 1145 36 128 TypeScript 11 1607 1371 66 170 YAML 3 25 23 2 0 ───────────────────────────────────────────────────────────────────────────────── Jupyter Notebooks 3 122 83 23 16 |- Markdown 1 60 30 22 8 |- Python 1 122 113 1 8 (Total) 304 226 46 32 ───────────────────────────────────────────────────────────────────────────────── Markdown 119 8232 0 5591 2641 |- BASH 52 491 432 34 25 |- Dockerfile 2 5 5 0 0 |- JSON 16 582 582 0 0 |- PowerShell 3 5 5 0 0 |- Python 22 687 604 5 78 |- Rust 13 415 362 1 52 |- TOML 9 107 83 3 21 |- YAML 1 9 9 0 0 (Total) 10533 2082 5634 2817 ───────────────────────────────────────────────────────────────────────────────── Rust 571 245656 216375 6437 22844 |- Markdown 379 9235 452 7653 1130 (Total) 254891 216827 14090 23974 ───────────────────────────────────────────────────────────────────────────────── Svelte 18 1831 1696 50 85 |- CSS 1 4 4 0 0 |- JavaScript 18 876 727 24 125 (Total) 2711 2427 74 210 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Total 1025 326405 266585 25848 33972 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ |
212955c to
8041506
Compare
Contributor
Author
|
Agent 6 follow-up on existing A100 validation: this remains valid as a targeted scheduler fix. Classification: |
4abee42 to
391e829
Compare
391e829 to
bf8e9d2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Sequences that enter
SequenceState::Errorwere not matched byis_finished_paged_attn(). In PagedAttention, that means an errored sequence can remain in scheduler state instead of being treated as terminal for cleanup.SequenceState::Erroris set on real inference error paths, includinghandle_seq_error_stateaware_ok!andhandle_pipeline_forward_error!. Scheduler cleanup callsfree_finished_sequence_groups(), which depends onis_finished_paged_attn().Fix
Add
SequenceState::Errorto the terminal states returned byis_finished_paged_attn(), alongsideFinishedAborted,FinishedIgnored, andDone.Files changed
mistralrs-core/src/sequence.rsValidation
Current branch head after Agent 5 follow-up:
593b6f298989769801cabaee4102c4069be3ce7a.A100 hardware/software used by Agent 5:
a2-highgpu-1g, 1xNVIDIA A100-SXM4-40GB, 40960 MiB580.126.09;nvidia-smiCUDA13.0; CUDA toolkit12.9.411.95.0; Cargo1.95.0Before/after targeted check:
cargo test -q -p mistralrs-core error_state_is_finished_for_paged_attention_cleanup --lib2d4ba4f16f61e5e18be085d0dd137bc95cba038aplus injected regression: failed;SequenceState::Error must be terminal for PagedAttention cleanup.0715e0a2ddb90099d353547dc9840f03897f23f2plus the same regression: passed.Committed tests on current head:
A100 result: both passed.
The committed coverage verifies that
Erroris terminal for PagedAttention cleanup and that existingDone/FinishedAborted/FinishedIgnoredbehavior remains terminal while normal running states remain nonterminal.Merge / Issue-Linking Note
The branch commit with stale auto-close wording has been rewritten. Final squash/merge wording should use
Refs #2058orRelated to #2058, without a closing keyword before#2058, so #2058 is not closed automatically. Safe wording: “Fixes the scheduler-state half:SequenceState::Erroris terminal for PagedAttention cleanup; does not prove the full Windows RTX 3080 hang is fixed.”Relationship to #2058 / #2076
Classification:
TARGETED.This is the scheduler-state half of #2076. It is related to #2058, but it is not a full runtime reproduction of #2058. Agent 5 did not recover a live A100 inference repro where a sequence enters
SequenceState::Error, blocks KV cleanup, and then prevents new scheduling. The original #2058 report used Windows 11, RTX 3080 12GB, CUDA 12.9 / driver 591.86, andgoogle/gemma-4-E2B-it.Safe merge claim: this PR makes
SequenceState::Errorterminal for PagedAttention cleanup. It should not claim to fully resolve the #2058 report without a runtime before/after on the original hang conditions.