fix: speculative decoding broken on hybrid SSM/MoE (Qwen3.5 MoE) by eauchs · Pull Request #20075 · ggml-org/llama.cpp

eauchs · 2026-03-03T14:57:00Z

Speculative decoding on hybrid SSM/MoE models is broken right now. With a draft model you either crash immediately ("the target context does not support partial sequence removal") or end up with garbage loops. Took me a while to track down why.
Two things were wrong in find_slot: empty_cell.src was pointing to orig_cell.src instead of seq_meta.tail (so the graph was reading stale state from the wrong cell), and the copy_cell call was just... missing. On top of that, llama_memory_recurrent has no rollback mechanism at all for the SSM state when draft tokens get rejected, which is what causes the state drift.
Fix adds a checkpoint/restore with a rolling buffer (depth 8 per sequence). On Metal, ggml_backend_tensor_copy is synchronous in ggml 0.9.7 so no barrier needed.
Numbers on M3 Max 128GB:

Qwen3.5-122B-A10B-UD-Q4_K_XL + Qwen3.5-0.8B draft
Baseline: ~20.4 t/s → with patch: 23.5–29.7 t/s, acceptance rate around 63–89% depending on --draft-max
No garbage loops over extended runs

The checkpoint depth (8) and memory guard are hardcoded for now — not sure if it's worth exposing them as llama_context_params, open to feedback. VRAM overhead is n_seq_max × depth × SSM_state_size_per_layer, fine on my end but probably worth discussing for smaller devices.

…odels

FatheredPuma81 · 2026-03-04T21:32:58Z

I used Claude to build the latest llama.cpp with this fix and it works but idk how you're getting 63-89% acceptance cause I'm only getting 44% and a bit less than half the t/s on both 27B UD-Q4_K_XL and 122B UD-Q3_K_XL with 0.8B UD-Q4_K_XL. I've also encountered issues with both 27B and 122B looping when using a draft model.

eauchs · 2026-03-04T22:31:05Z

I used Claude to build the latest llama.cpp with this fix and it works but idk how you're getting 63-89% acceptance cause I'm only getting 44% and a bit less than half the t/s on both 27B UD-Q4_K_XL and 122B UD-Q3_K_XL with 0.8B UD-Q4_K_XL. I've also encountered issues with both 27B and 122B looping when using a draft model.

can you share your logs ?
With my command I have this ./build/bin/llama-server
-m /Users/xx/models/ml/Qwen3.5-122B-A10B-UD-Q4_K_XL-00001-of-00003.gguf
-md /Users/x/models/ml/Qwen3.5-0.8B-UD-Q4_K_XL.gguf
--host 0.0.0.0 --port 8082
-ngl 99 -ngld 99 -fa on
-c 50096 -np 1
--draft-max 8
--temp 0.1
--min-p 0.1
-t 12 -b 4096 -ub 1024
--cache-type-k q4_0 --cache-type-v q4_0
--verbose-prompt
--chat-template-file template.jinja

prompt eval time = 207.90 ms / 13 tokens ( 15.99 ms per token, 62.53 tokens per second)
eval time = 640.61 ms / 19 tokens ( 33.72 ms per token, 29.66 tokens per second)
total time = 848.52 ms / 32 tokens
draft acceptance rate = 0.65000 ( 13 accepted / 20 generated)
statistics draft: #calls(b,g,a) = 4 532 445, #gen drafts = 532, #acc drafts = 445, #gen tokens = 2752, #acc tokens = 2563, dur(b,g,a) = 0.002, 25227.614, 0.130 ms

FatheredPuma81 · 2026-03-05T00:00:53Z

--temp 0.1
--min-p 0.1

Yep that explains it right there.

The prompt is "Write me a Flappy Bird clone entirely in a single HTML File." for the below runs.

Qwen3.5 122B

@echo off
title Llama.cpp Server - Qwen3.5-122B-A10B

set MODEL_PATH="P:\z AI Stuff\LM_Studio\models\unsloth\Qwen3.5-122B-A10B-GGUF\Qwen3.5-122B-A10B-UD-Q3_K_XL-00001-of-00003.gguf"
set DRAFT_PATH="P:\z AI Stuff\LM_Studio\models\unsloth\Qwen3.5-0.8B-GGUF\Qwen3.5-0.8B-UD-Q4_K_XL.gguf"

"C:\Users\Fathe\Desktop\Llama.cpp\Qwenllama\build\bin\llama-server.exe" ^
    -m %MODEL_PATH% ^
    --jinja ^
    --alias qwen3.5-122b ^
    --reasoning-format deepseek ^
    -ncmoe 33 ^
    -ngl 99 ^
    --no-mmap ^
    -c 65535 ^
    -np 1 ^
    -fa on ^
    -ctk q8_0 ^
    -ctv q8_0 ^
    -b 2048 ^
    -ub 512 ^
    -t 16 ^
    --context-shift ^
    --temp 1.0 ^
    --top-p 0.95 ^
    --top-k 20 ^
    --min-p 0.0 ^
    --presence-penalty 1.5 ^
    --model-draft %DRAFT_PATH% ^
    -ngld 99 ^
    --draft-min 0 ^
    --draft-max 8 ^
    --metrics ^
    --host 0.0.0.0 ^
    --port 8080

if %ERRORLEVEL% NEQ 0 pause

prompt eval time =     649.41 ms /    24 tokens (   27.06 ms per token,    36.96 tokens per second)
       eval time =  420435.06 ms /  7239 tokens (   58.08 ms per token,    17.22 tokens per second)
      total time =  421084.47 ms /  7263 tokens
draft acceptance rate = 0.29187 ( 1971 accepted /  6753 generated)
statistics draft: #calls(b,g,a) = 1 5267 1401, #gen drafts = 5267, #acc drafts = 1401, #gen tokens = 6753, #acc tokens = 1971, dur(b,g,a) = 0.000, 38248.425, 0.387 ms

Qwen3.5 27B

@echo off
title Llama.cpp Server - Qwen3.5-27B + 0.8B Draft
set MODEL_PATH="P:\z AI Stuff\LM_Studio\models\unsloth\Qwen3.5-27B-GGUF\Qwen3.5-27B-UD-Q4_K_XL.gguf"
set DRAFT_PATH="P:\z AI Stuff\LM_Studio\models\unsloth\Qwen3.5-0.8B-GGUF\Qwen3.5-0.8B-UD-Q4_K_XL.gguf"
"C:\Users\Fathe\Desktop\Llama.cpp\Qwenllama\build\bin\llama-server.exe" ^
    -m %MODEL_PATH% ^
    --jinja ^
    --alias qwen3.5-27b ^
    --reasoning-format deepseek ^
    -ngl 99 ^
    --no-mmap ^
    -c 65535 ^
    -np 1 ^
    -fa on ^
    -ctk q8_0 ^
    -ctv q8_0 ^
    -b 2048 ^
    -ub 512 ^
    -t 16 ^
    --temp 1.0 ^
    --top-p 0.95 ^
    --top-k 20 ^
    --min-p 0.00 ^
    --presence-penalty 1.5 ^
    --model-draft %DRAFT_PATH% ^
    -ngld 99 ^
    --draft-min 0 ^
    --draft-max 8 ^
    --metrics ^
    --host 0.0.0.0 ^
    --port 8080
if %ERRORLEVEL% NEQ 0 pause

prompt eval time =     142.38 ms /    24 tokens (    5.93 ms per token,   168.56 tokens per second)
       eval time =  102108.78 ms /  3458 tokens (   29.53 ms per token,    33.87 tokens per second)
      total time =  102251.16 ms /  3482 tokens
draft acceptance rate = 0.49879 ( 1644 accepted /  3296 generated)
statistics draft: #calls(b,g,a) = 1 1813 871, #gen drafts = 1813, #acc drafts = 871, #gen tokens = 3296, #acc tokens = 1644, dur(b,g,a) = 0.000, 18967.956, 0.192 ms

Qwen3.5 27B Temp set to 0.1 and Min P set to 0.01

prompt eval time =     140.57 ms /    24 tokens (    5.86 ms per token,   170.73 tokens per second)
       eval time =   42965.89 ms /  2083 tokens (   20.63 ms per token,    48.48 tokens per second)
      total time =   43106.46 ms /  2107 tokens
draft acceptance rate = 0.70029 ( 1458 accepted /  2082 generated)
statistics draft: #calls(b,g,a) = 1 624 435, #gen drafts = 624, #acc drafts = 435, #gen tokens = 2082, #acc tokens = 1458, dur(b,g,a) = 0.000, 10527.953, 0.057 ms

Qwen3.5 27B Official Precise Coding Settings (temp=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0)
Gave up after 6 attempts with Draft Model enabled. 5 of them got stuck in a loop and the 5th run even stopped generating outright. I noticed 27B freeze a few more times actually but chalked it up to some bug.
Disabled Draft model and I no matter how much I regenerate it never gets stuck in a loop. Output quality also seems to have improved considerably.

FatheredPuma81 · 2026-03-05T00:36:26Z

I asked Claude to look at the Commit and see if it sees anything since I certainly can't lol and here's what it says if it helps any.

I can see a few genuinely suspect things in this diff:

The soft rollback doesn't restore tensor state:
cells[tail_id].pos = p0 - 1;
This just moves the position metadata back but doesn't copy any actual SSM tensor data to match. The metadata says one thing, the tensor state says another. For a dense model like 27B this path could corrupt sequence state silently.

This partial removal block is broken:
if (p0 > 0 && p1 == std::numeric_limits<llama_pos>::max()) {
    cells[i].pos = p0 - 1;
    continue;
}
It skips the normal cleanup — seq_id never gets cleared, used never gets decremented. Cells accumulate with stale positions over time, which means find_slot progressively sees less available space than actually exists and can start making wrong decisions.

The checkpoint threshold is arbitrary and untested:
if (get_cell_count(seq_id) < 8 && used < size * 0.9)
If you hit 90% cell usage — which long generation absolutely can do — it silently stops keeping checkpoints and starts evicting. No warning, no fallback.

Performance cost:
copy_cell creates and destroys a ggml_context per layer per call. On 27B that's potentially dozens of tiny alloc/free cycles per speculative step. That alone could explain part of your t/s hit.

Your looping is almost certainly the stale cell accumulation bug. It's a real issue worth adding to your PR comment.

For what its worth I bumped Context to 130k and had Qwen3.5 27B without a draft model and with the Precise settings create the Diffs in plain text using the HTML for Claude since it couldn't access the Commit.

The soft rollback path (cells[tail_id].pos = p0 - 1) only updated position metadata, leaving SSM tensor state (r_l/s_l) reflecting the post-speculative position. This caused silent state corruption and looping on speculative decoding rejection for recurrent/hybrid models (e.g. Qwen3.5 MoE 27B). seq_rm now returns false when no checkpoint exists at p0-1, correctly signaling to the caller that rollback requires re-evaluation. The hybrid memory layer already propagates false correctly. Also add a LLAMA_LOG_DEBUG when the 0.9 cache threshold prevents checkpoint creation, making the behavior visible rather than silent. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

eauchs · 2026-03-05T10:02:52Z

Thanks for the detailed testing. The acceptance rate difference is expected —
speculative decoding acceptance correlates strongly with temperature; at
temp=1.0/min-p=0.0 you'll always see lower rates regardless of the patch.

The looping on 27B was a real bug: the soft rollback path (cells[tail_id].pos
= p0 - 1) only rewound position metadata, leaving the SSM tensor state
(r_l/s_l) in the post-speculative position. The model was generating from a
corrupted hidden state.

Fix in latest commit: seq_rm now returns false when no checkpoint exists at
p0-1, signaling to the caller that rollback requires re-evaluation. The hybrid
memory layer (llama_memory_hybrid::seq_rm) already propagates false
correctly.

Also added a LLAMA_LOG_DEBUG when the 0.9 threshold prevents checkpoint
creation, so the behavior is visible rather than silent.

Can you retest the 27B looping case with this commit?

…eq_rm" This reverts commit 9a04ac4.

The checkpoint mechanism in find_slot only triggered when a sequence moved to a new cell (has_cell=false), which never occurs during normal single-sequence autoregressive generation. As a result, seq_rm had no checkpoint to roll back to during speculative decoding rejection. Fix: add checkpoint creation in the has_cell=true branch. Before the current cell is overwritten with new tokens, its SSM state (r_l/s_l) is copied to a free cell and kept as a checkpoint. This makes the rollback history available for the common single-sequence case. Also replace the soft rollback in seq_rm (which only rewound position metadata, leaving tensor state corrupted) with a proper return false, signaling to the caller that re-evaluation is required when no checkpoint exists at p0-1. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

stephensrmmartin · 2026-03-09T18:44:33Z

Using this PR, I am still not able to enable spec decoding:

[54031] srv    load_model: speculative decoding not supported by this context

Full log:

main: NOTE: router mode is experimental
main:       it is not recommended to use this mode in untrusted environments
srv  ensure_model: model name=default:latest is not loaded, loading...
srv          load: spawning server instance with name=default:latest on port 54031
srv          load: spawning server instance with args:
srv          load:   /app/llama-server
srv          load:   --cache-reuse
srv          load:   256
srv          load:   --chat-template-kwargs
srv          load:   {"enable_thinking":false}
srv          load:   --context-shift
srv          load:   --draft-p-min
srv          load:   .75
srv          load:   --host
srv          load:   127.0.0.1
srv          load:   --jinja
srv          load:   --min-p
srv          load:   0.0
srv          load:   --mmap
srv          load:   --no-mmproj-auto
srv          load:   --port
srv          load:   54031
srv          load:   --presence-penalty
srv          load:   1.5
srv          load:   --repeat-penalty
srv          load:   1.0
srv          load:   --sleep-idle-seconds
srv          load:   300
srv          load:   --temperature
srv          load:   0.7
srv          load:   --top-k
srv          load:   20
srv          load:   --top-p
srv          load:   0.8
srv          load:   --alias
srv          load:   default:latest
srv          load:   --ctx-size-draft
srv          load:   32000
srv          load:   --cpu-moe
srv          load:   --cache-ram
srv          load:   8192
srv          load:   --cache-type-k
srv          load:   q8_0
srv          load:   --cache-type-v
srv          load:   q8_0
srv          load:   --flash-attn
srv          load:   on
srv          load:   --fit
srv          load:   on
srv          load:   --fit-ctx
srv          load:   32000
srv          load:   --fit-target
srv          load:   4096
srv          load:   --hf-repo
srv          load:   unsloth/Qwen3.5-35B-A3B-GGUF:Q4_K_M
srv          load:   --hf-repo-draft
srv          load:   unsloth/Qwen3.5-0.8B-GGUF:Q4_K_M
srv          load:   --kv-unified
srv          load:   --parallel
srv          load:   1
srv          load:   --threads
srv          load:   8
srv  ensure_model: waiting until model name=default:latest is fully loaded...
[54031] ggml_cuda_init: found 1 ROCm devices:
[54031]   Device 0: AMD Radeon RX 6700 XT, gfx1030 (0x1030), VMM: no, Wave Size: 32
[54031] load_backend: loaded ROCm backend from /app/libggml-hip.so
[54031] load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
[54031] common_download_file_single_online: no previous model file found /models/unsloth_Qwen3.5-35B-A3B-GGUF_preset.ini
[54031] common_download_file_single_online: HEAD failed, status: 404
[54031] no remote preset found, skipping
[54031] common_download_file_single_online: using cached file (same etag): /models/unsloth_Qwen3.5-35B-A3B-GGUF_Qwen3.5-35B-A3B-Q4_K_M.gguf
[54031] common_download_file_single_online: using cached file (same etag): /models/unsloth_Qwen3.5-0.8B-GGUF_Qwen3.5-0.8B-Q4_K_M.gguf
[54031] build: 8196 (8a6b1c860) with GNU 13.3.0 for Linux x86_64
[54031] system info: n_threads = 8, n_threads_batch = 8, total_threads = 12
[54031] 
[54031] system_info: n_threads = 8 (n_threads_batch = 8) / 12 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 
[54031] 
[54031] Running without SSL
[54031] init: using 11 threads for HTTP server
[54031] start: binding port with default address family
[54031] main: loading model
[54031] srv    load_model: loading model '/models/unsloth_Qwen3.5-35B-A3B-GGUF_Qwen3.5-35B-A3B-Q4_K_M.gguf'
[54031] common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
[54031] llama_params_fit_impl: projected to use 5654 MiB of device memory vs. 12026 MiB of free device memory
[54031] llama_params_fit_impl: will leave 6371 >= 4096 MiB of free device memory, no changes needed
[54031] llama_params_fit: successfully fit params to free device memory
[54031] llama_params_fit: fitting params to free memory took 0.56 seconds
[54031] llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon RX 6700 XT) (0000:0b:00.0) - 12090 MiB free
[54031] llama_model_loader: loaded meta data with 52 key-value pairs and 733 tensors from /models/unsloth_Qwen3.5-35B-A3B-GGUF_Qwen3.5-35B-A3B-Q4_K_M.gguf (version GGUF V3 (latest))
[54031] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[54031] llama_model_loader: - kv   0:                       general.architecture str              = qwen35moe
[54031] llama_model_loader: - kv   1:                               general.type str              = model
[54031] llama_model_loader: - kv   2:                     general.sampling.top_k i32              = 20
[54031] llama_model_loader: - kv   3:                     general.sampling.top_p f32              = 0.950000
[54031] llama_model_loader: - kv   4:                      general.sampling.temp f32              = 1.000000
[54031] llama_model_loader: - kv   5:                               general.name str              = Qwen3.5-35B-A3B
[54031] llama_model_loader: - kv   6:                           general.basename str              = Qwen3.5-35B-A3B
[54031] llama_model_loader: - kv   7:                       general.quantized_by str              = Unsloth
[54031] llama_model_loader: - kv   8:                         general.size_label str              = 35B-A3B
[54031] llama_model_loader: - kv   9:                            general.license str              = apache-2.0
[54031] llama_model_loader: - kv  10:                       general.license.link str              = https://huggingface.co/Qwen/Qwen3.5-3...
[54031] llama_model_loader: - kv  11:                           general.repo_url str              = https://huggingface.co/unsloth
[54031] llama_model_loader: - kv  12:                   general.base_model.count u32              = 1
[54031] llama_model_loader: - kv  13:                  general.base_model.0.name str              = Qwen3.5 35B A3B
[54031] llama_model_loader: - kv  14:          general.base_model.0.organization str              = Qwen
[54031] llama_model_loader: - kv  15:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen3.5-3...
[54031] llama_model_loader: - kv  16:                               general.tags arr[str,2]       = ["unsloth", "image-text-to-text"]
[54031] llama_model_loader: - kv  17:                      qwen35moe.block_count u32              = 40
[54031] llama_model_loader: - kv  18:                   qwen35moe.context_length u32              = 262144
[54031] llama_model_loader: - kv  19:                 qwen35moe.embedding_length u32              = 2048
[54031] llama_model_loader: - kv  20:             qwen35moe.attention.head_count u32              = 16
[54031] llama_model_loader: - kv  21:          qwen35moe.attention.head_count_kv u32              = 2
[54031] llama_model_loader: - kv  22:          qwen35moe.rope.dimension_sections arr[i32,4]       = [11, 11, 10, 0]
[54031] llama_model_loader: - kv  23:                   qwen35moe.rope.freq_base f32              = 10000000.000000
[54031] llama_model_loader: - kv  24: qwen35moe.attention.layer_norm_rms_epsilon f32              = 0.000001
[54031] llama_model_loader: - kv  25:                     qwen35moe.expert_count u32              = 256
[54031] llama_model_loader: - kv  26:                qwen35moe.expert_used_count u32              = 8
[54031] llama_model_loader: - kv  27:             qwen35moe.attention.key_length u32              = 256
[54031] llama_model_loader: - kv  28:           qwen35moe.attention.value_length u32              = 256
[54031] llama_model_loader: - kv  29:       qwen35moe.expert_feed_forward_length u32              = 512
[54031] llama_model_loader: - kv  30: qwen35moe.expert_shared_feed_forward_length u32              = 512
[54031] llama_model_loader: - kv  31:                  qwen35moe.ssm.conv_kernel u32              = 4
[54031] llama_model_loader: - kv  32:                   qwen35moe.ssm.state_size u32              = 128
[54031] llama_model_loader: - kv  33:                  qwen35moe.ssm.group_count u32              = 16
[54031] llama_model_loader: - kv  34:               qwen35moe.ssm.time_step_rank u32              = 32
[54031] llama_model_loader: - kv  35:                   qwen35moe.ssm.inner_size u32              = 4096
[54031] llama_model_loader: - kv  36:          qwen35moe.full_attention_interval u32              = 4
[54031] llama_model_loader: - kv  37:             qwen35moe.rope.dimension_count u32              = 64
[54031] llama_model_loader: - kv  38:                       tokenizer.ggml.model str              = gpt2
[54031] llama_model_loader: - kv  39:                         tokenizer.ggml.pre str              = qwen35
[54031] llama_model_loader: - kv  40:                      tokenizer.ggml.tokens arr[str,248320]  = ["!", "\"", "#", "$", "%", "&", "'", ...
[54031] llama_model_loader: - kv  41:                  tokenizer.ggml.token_type arr[i32,248320]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
[54031] llama_model_loader: - kv  42:                      tokenizer.ggml.merges arr[str,247587]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
[54031] llama_model_loader: - kv  43:                tokenizer.ggml.eos_token_id u32              = 248046
[54031] llama_model_loader: - kv  44:            tokenizer.ggml.padding_token_id u32              = 248055
[54031] llama_model_loader: - kv  45:                    tokenizer.chat_template str              = {%- set image_count = namespace(value...
[54031] llama_model_loader: - kv  46:               general.quantization_version u32              = 2
[54031] llama_model_loader: - kv  47:                          general.file_type u32              = 15
[54031] llama_model_loader: - kv  48:                      quantize.imatrix.file str              = Qwen3.5-35B-A3B-GGUF/imatrix_unsloth....
[54031] llama_model_loader: - kv  49:                   quantize.imatrix.dataset str              = unsloth_calibration_Qwen3.5-35B-A3B.txt
[54031] llama_model_loader: - kv  50:             quantize.imatrix.entries_count u32              = 510
[54031] llama_model_loader: - kv  51:              quantize.imatrix.chunks_count u32              = 76
[54031] llama_model_loader: - type  f32:  301 tensors
[54031] llama_model_loader: - type q8_0:  311 tensors
[54031] llama_model_loader: - type q4_K:   80 tensors
[54031] llama_model_loader: - type q5_K:   40 tensors
[54031] llama_model_loader: - type q6_K:    1 tensors
[54031] print_info: file format = GGUF V3 (latest)
[54031] print_info: file type   = Q4_K - Medium
[54031] print_info: file size   = 20.49 GiB (5.08 BPW) 
[54031] load: 0 unused tokens
[54031] load: printing all EOG tokens:
[54031] load:   - 248044 ('<|endoftext|>')
[54031] load:   - 248046 ('<|im_end|>')
[54031] load:   - 248063 ('<|fim_pad|>')
[54031] load:   - 248064 ('<|repo_name|>')
[54031] load:   - 248065 ('<|file_sep|>')
[54031] load: special tokens cache size = 33
[54031] load: token to piece cache size = 1.7581 MB
[54031] print_info: arch                  = qwen35moe
[54031] print_info: vocab_only            = 0
[54031] print_info: no_alloc              = 0
[54031] print_info: n_ctx_train           = 262144
[54031] print_info: n_embd                = 2048
[54031] print_info: n_embd_inp            = 2048
[54031] print_info: n_layer               = 40
[54031] print_info: n_head                = 16
[54031] print_info: n_head_kv             = 2
[54031] print_info: n_rot                 = 64
[54031] print_info: n_swa                 = 0
[54031] print_info: is_swa_any            = 0
[54031] print_info: n_embd_head_k         = 256
[54031] print_info: n_embd_head_v         = 256
[54031] print_info: n_gqa                 = 8
[54031] print_info: n_embd_k_gqa          = 512
[54031] print_info: n_embd_v_gqa          = 512
[54031] print_info: f_norm_eps            = 0.0e+00
[54031] print_info: f_norm_rms_eps        = 1.0e-06
[54031] print_info: f_clamp_kqv           = 0.0e+00
[54031] print_info: f_max_alibi_bias      = 0.0e+00
[54031] print_info: f_logit_scale         = 0.0e+00
[54031] print_info: f_attn_scale          = 0.0e+00
[54031] print_info: n_ff                  = 0
[54031] print_info: n_expert              = 256
[54031] print_info: n_expert_used         = 8
[54031] print_info: n_expert_groups       = 0
[54031] print_info: n_group_used          = 0
[54031] print_info: causal attn           = 1
[54031] print_info: pooling type          = 0
[54031] print_info: rope type             = 40
[54031] print_info: rope scaling          = linear
[54031] print_info: freq_base_train       = 10000000.0
[54031] print_info: freq_scale_train      = 1
[54031] print_info: n_ctx_orig_yarn       = 262144
[54031] print_info: rope_yarn_log_mul     = 0.0000
[54031] print_info: rope_finetuned        = unknown
[54031] print_info: mrope sections        = [11, 11, 10, 0]
[54031] print_info: ssm_d_conv            = 4
[54031] print_info: ssm_d_inner           = 4096
[54031] print_info: ssm_d_state           = 128
[54031] print_info: ssm_dt_rank           = 32
[54031] print_info: ssm_n_group           = 16
[54031] print_info: ssm_dt_b_c_rms        = 0
[54031] print_info: model type            = ?B
[54031] print_info: model params          = 34.66 B
[54031] print_info: general.name          = Qwen3.5-35B-A3B
[54031] print_info: vocab type            = BPE
[54031] print_info: n_vocab               = 248320
[54031] print_info: n_merges              = 247587
[54031] print_info: BOS token             = 11 ','
[54031] print_info: EOS token             = 248046 '<|im_end|>'
[54031] print_info: EOT token             = 248046 '<|im_end|>'
[54031] print_info: PAD token             = 248055 '<|vision_pad|>'
[54031] print_info: LF token              = 198 'Ċ'
[54031] print_info: FIM PRE token         = 248060 '<|fim_prefix|>'
[54031] print_info: FIM SUF token         = 248062 '<|fim_suffix|>'
[54031] print_info: FIM MID token         = 248061 '<|fim_middle|>'
[54031] print_info: FIM PAD token         = 248063 '<|fim_pad|>'
[54031] print_info: FIM REP token         = 248064 '<|repo_name|>'
[54031] print_info: FIM SEP token         = 248065 '<|file_sep|>'
[54031] print_info: EOG token             = 248044 '<|endoftext|>'
[54031] print_info: EOG token             = 248046 '<|im_end|>'
[54031] print_info: EOG token             = 248063 '<|fim_pad|>'
[54031] print_info: EOG token             = 248064 '<|repo_name|>'
[54031] print_info: EOG token             = 248065 '<|file_sep|>'
[54031] print_info: max token length      = 256
[54031] load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
[54031] load_tensors: offloading output layer to GPU
[54031] load_tensors: offloading 39 repeating layers to GPU
[54031] load_tensors: offloaded 41/41 layers to GPU
[54031] load_tensors:   CPU_Mapped model buffer size = 20586.71 MiB
[54031] load_tensors:        ROCm0 model buffer size =  1910.32 MiB
[54031] ..................................................................................................
[54031] common_init_result: added <|endoftext|> logit bias = -inf
[54031] common_init_result: added <|im_end|> logit bias = -inf
[54031] common_init_result: added <|fim_pad|> logit bias = -inf
[54031] common_init_result: added <|repo_name|> logit bias = -inf
[54031] common_init_result: added <|file_sep|> logit bias = -inf
[54031] llama_context: constructing llama_context
[54031] llama_context: n_seq_max     = 1
[54031] llama_context: n_ctx         = 262144
[54031] llama_context: n_ctx_seq     = 262144
[54031] llama_context: n_batch       = 2048
[54031] llama_context: n_ubatch      = 512
[54031] llama_context: causal_attn   = 1
[54031] llama_context: flash_attn    = enabled
[54031] llama_context: kv_unified    = true
[54031] llama_context: freq_base     = 10000000.0
[54031] llama_context: freq_scale    = 1
[54031] llama_context:  ROCm_Host  output buffer size =     0.95 MiB
[54031] llama_kv_cache:      ROCm0 KV buffer size =  2720.00 MiB
[54031] llama_kv_cache: size = 2720.00 MiB (262144 cells,  10 layers,  1/1 seqs), K (q8_0): 1360.00 MiB, V (q8_0): 1360.00 MiB
[54031] llama_memory_recurrent:      ROCm0 RS buffer size =    62.81 MiB
[54031] llama_memory_recurrent: size =   62.81 MiB (     1 cells,  40 layers,  1 seqs), R (f32):    2.81 MiB, S (f32):   60.00 MiB
[54031] sched_reserve: reserving ...
[54031] sched_reserve:      ROCm0 compute buffer size =   961.00 MiB
[54031] sched_reserve:  ROCm_Host compute buffer size =   520.02 MiB
[54031] sched_reserve: graph nodes  = 9399 (with bs=512), 4389 (with bs=1)
[54031] sched_reserve: graph splits = 122 (with bs=512), 82 (with bs=1)
[54031] sched_reserve: reserve took 619.07 ms, sched copies = 1
[54031] common_init_from_params: KV cache shifting is not supported for this context, disabling KV cache shifting
[54031] common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
[54031] srv    load_model: loading draft model '/models/unsloth_Qwen3.5-0.8B-GGUF_Qwen3.5-0.8B-Q4_K_M.gguf'
[54031] llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon RX 6700 XT) (0000:0b:00.0) - 6334 MiB free
[54031] llama_model_loader: loaded meta data with 46 key-value pairs and 320 tensors from /models/unsloth_Qwen3.5-0.8B-GGUF_Qwen3.5-0.8B-Q4_K_M.gguf (version GGUF V3 (latest))
[54031] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[54031] llama_model_loader: - kv   0:                       general.architecture str              = qwen35
[54031] llama_model_loader: - kv   1:                               general.type str              = model
[54031] llama_model_loader: - kv   2:                               general.name str              = Qwen3.5-0.8B
[54031] llama_model_loader: - kv   3:                           general.basename str              = Qwen3.5-0.8B
[54031] llama_model_loader: - kv   4:                       general.quantized_by str              = Unsloth
[54031] llama_model_loader: - kv   5:                         general.size_label str              = 0.8B
[54031] llama_model_loader: - kv   6:                            general.license str              = apache-2.0
[54031] llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen3.5-0...
[54031] llama_model_loader: - kv   8:                           general.repo_url str              = https://huggingface.co/unsloth
[54031] llama_model_loader: - kv   9:                   general.base_model.count u32              = 1
[54031] llama_model_loader: - kv  10:                  general.base_model.0.name str              = Qwen3.5 0.8B
[54031] llama_model_loader: - kv  11:          general.base_model.0.organization str              = Qwen
[54031] llama_model_loader: - kv  12:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen3.5-0.8B
[54031] llama_model_loader: - kv  13:                               general.tags arr[str,1]       = ["image-text-to-text"]
[54031] llama_model_loader: - kv  14:                         qwen35.block_count u32              = 24
[54031] llama_model_loader: - kv  15:                      qwen35.context_length u32              = 262144
[54031] llama_model_loader: - kv  16:                    qwen35.embedding_length u32              = 1024
[54031] llama_model_loader: - kv  17:                 qwen35.feed_forward_length u32              = 3584
[54031] llama_model_loader: - kv  18:                qwen35.attention.head_count u32              = 8
[54031] llama_model_loader: - kv  19:             qwen35.attention.head_count_kv u32              = 2
[54031] llama_model_loader: - kv  20:             qwen35.rope.dimension_sections arr[i32,4]       = [11, 11, 10, 0]
[54031] llama_model_loader: - kv  21:                      qwen35.rope.freq_base f32              = 10000000.000000
[54031] llama_model_loader: - kv  22:    qwen35.attention.layer_norm_rms_epsilon f32              = 0.000001
[54031] llama_model_loader: - kv  23:                qwen35.attention.key_length u32              = 256
[54031] llama_model_loader: - kv  24:              qwen35.attention.value_length u32              = 256
[54031] llama_model_loader: - kv  25:                     qwen35.ssm.conv_kernel u32              = 4
[54031] llama_model_loader: - kv  26:                      qwen35.ssm.state_size u32              = 128
[54031] llama_model_loader: - kv  27:                     qwen35.ssm.group_count u32              = 16
[54031] llama_model_loader: - kv  28:                  qwen35.ssm.time_step_rank u32              = 16
[54031] llama_model_loader: - kv  29:                      qwen35.ssm.inner_size u32              = 2048
[54031] llama_model_loader: - kv  30:             qwen35.full_attention_interval u32              = 4
[54031] llama_model_loader: - kv  31:                qwen35.rope.dimension_count u32              = 64
[54031] llama_model_loader: - kv  32:                       tokenizer.ggml.model str              = gpt2
[54031] llama_model_loader: - kv  33:                         tokenizer.ggml.pre str              = qwen35
[54031] llama_model_loader: - kv  34:                      tokenizer.ggml.tokens arr[str,248320]  = ["!", "\"", "#", "$", "%", "&", "'", ...
[54031] llama_model_loader: - kv  35:                  tokenizer.ggml.token_type arr[i32,248320]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
[54031] llama_model_loader: - kv  36:                      tokenizer.ggml.merges arr[str,247587]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
[54031] llama_model_loader: - kv  37:                tokenizer.ggml.eos_token_id u32              = 248046
[54031] llama_model_loader: - kv  38:            tokenizer.ggml.padding_token_id u32              = 248055
[54031] llama_model_loader: - kv  39:                    tokenizer.chat_template str              = {%- set image_count = namespace(value...
[54031] llama_model_loader: - kv  40:               general.quantization_version u32              = 2
[54031] llama_model_loader: - kv  41:                          general.file_type u32              = 15
[54031] llama_model_loader: - kv  42:                      quantize.imatrix.file str              = Qwen3.5-0.8B-GGUF/imatrix_unsloth.gguf
[54031] llama_model_loader: - kv  43:                   quantize.imatrix.dataset str              = unsloth_calibration_Qwen3.5-0.8B.txt
[54031] llama_model_loader: - kv  44:             quantize.imatrix.entries_count u32              = 186
[54031] llama_model_loader: - kv  45:              quantize.imatrix.chunks_count u32              = 80
[54031] llama_model_loader: - type  f32:  133 tensors
[54031] llama_model_loader: - type q8_0:   36 tensors
[54031] llama_model_loader: - type q4_K:   98 tensors
[54031] llama_model_loader: - type q5_K:   36 tensors
[54031] llama_model_loader: - type q6_K:   17 tensors
[54031] print_info: file format = GGUF V3 (latest)
[54031] print_info: file type   = Q4_K - Medium
[54031] print_info: file size   = 497.39 MiB (5.55 BPW) 
[54031] load: 0 unused tokens
[54031] load: printing all EOG tokens:
[54031] load:   - 248044 ('<|endoftext|>')
[54031] load:   - 248046 ('<|im_end|>')
[54031] load:   - 248063 ('<|fim_pad|>')
[54031] load:   - 248064 ('<|repo_name|>')
[54031] load:   - 248065 ('<|file_sep|>')
[54031] load: special tokens cache size = 33
[54031] load: token to piece cache size = 1.7581 MB
[54031] print_info: arch                  = qwen35
[54031] print_info: vocab_only            = 0
[54031] print_info: no_alloc              = 0
[54031] print_info: n_ctx_train           = 262144
[54031] print_info: n_embd                = 1024
[54031] print_info: n_embd_inp            = 1024
[54031] print_info: n_layer               = 24
[54031] print_info: n_head                = 8
[54031] print_info: n_head_kv             = 2
[54031] print_info: n_rot                 = 64
[54031] print_info: n_swa                 = 0
[54031] print_info: is_swa_any            = 0
[54031] print_info: n_embd_head_k         = 256
[54031] print_info: n_embd_head_v         = 256
[54031] print_info: n_gqa                 = 4
[54031] print_info: n_embd_k_gqa          = 512
[54031] print_info: n_embd_v_gqa          = 512
[54031] print_info: f_norm_eps            = 0.0e+00
[54031] print_info: f_norm_rms_eps        = 1.0e-06
[54031] print_info: f_clamp_kqv           = 0.0e+00
[54031] print_info: f_max_alibi_bias      = 0.0e+00
[54031] print_info: f_logit_scale         = 0.0e+00
[54031] print_info: f_attn_scale          = 0.0e+00
[54031] print_info: n_ff                  = 3584
[54031] print_info: n_expert              = 0
[54031] print_info: n_expert_used         = 0
[54031] print_info: n_expert_groups       = 0
[54031] print_info: n_group_used          = 0
[54031] print_info: causal attn           = 1
[54031] print_info: pooling type          = 0
[54031] print_info: rope type             = 40
[54031] print_info: rope scaling          = linear
[54031] print_info: freq_base_train       = 10000000.0
[54031] print_info: freq_scale_train      = 1
[54031] print_info: n_ctx_orig_yarn       = 262144
[54031] print_info: rope_yarn_log_mul     = 0.0000
[54031] print_info: rope_finetuned        = unknown
[54031] print_info: mrope sections        = [11, 11, 10, 0]
[54031] print_info: ssm_d_conv            = 4
[54031] print_info: ssm_d_inner           = 2048
[54031] print_info: ssm_d_state           = 128
[54031] print_info: ssm_dt_rank           = 16
[54031] print_info: ssm_n_group           = 16
[54031] print_info: ssm_dt_b_c_rms        = 0
[54031] print_info: model type            = 2B
[54031] print_info: model params          = 752.39 M
[54031] print_info: general.name          = Qwen3.5-0.8B
[54031] print_info: vocab type            = BPE
[54031] print_info: n_vocab               = 248320
[54031] print_info: n_merges              = 247587
[54031] print_info: BOS token             = 11 ','
[54031] print_info: EOS token             = 248046 '<|im_end|>'
[54031] print_info: EOT token             = 248046 '<|im_end|>'
[54031] print_info: PAD token             = 248055 '<|vision_pad|>'
[54031] print_info: LF token              = 198 'Ċ'
[54031] print_info: FIM PRE token         = 248060 '<|fim_prefix|>'
[54031] print_info: FIM SUF token         = 248062 '<|fim_suffix|>'
[54031] print_info: FIM MID token         = 248061 '<|fim_middle|>'
[54031] print_info: FIM PAD token         = 248063 '<|fim_pad|>'
[54031] print_info: FIM REP token         = 248064 '<|repo_name|>'
[54031] print_info: FIM SEP token         = 248065 '<|file_sep|>'
[54031] print_info: EOG token             = 248044 '<|endoftext|>'
[54031] print_info: EOG token             = 248046 '<|im_end|>'
[54031] print_info: EOG token             = 248063 '<|fim_pad|>'
[54031] print_info: EOG token             = 248064 '<|repo_name|>'
[54031] print_info: EOG token             = 248065 '<|file_sep|>'
[54031] print_info: max token length      = 256
[54031] load_tensors: loading model tensors, this can take a while... (mmap = true, direct_io = false)
[54031] load_tensors: offloading output layer to GPU
[54031] load_tensors: offloading 23 repeating layers to GPU
[54031] load_tensors: offloaded 25/25 layers to GPU
[54031] load_tensors:   CPU_Mapped model buffer size =   198.93 MiB
[54031] load_tensors:        ROCm0 model buffer size =   497.40 MiB
[54031] .............................................
[54031] srv    load_model: cache_reuse is not supported by this context, it will be disabled
[54031] srv    load_model: initializing slots, n_slots = 1
[54031] common_speculative_is_compat: the target context does not support partial sequence removal
[54031] srv    load_model: speculative decoding not supported by this context
[54031] slot   load_model: id  0 | task -1 | new slot, n_ctx = 262144
[54031] srv    load_model: prompt cache is enabled, size limit: 8192 MiB
[54031] srv    load_model: use `--cache-ram 0` to disable the prompt cache
[54031] srv    load_model: for more info see https://github.com/ggml-org/llama.cpp/pull/16391

Config:

version = 1

parallel = 1
models-max = 2

[*]
host = 0.0.0.0
threads = 8
mlock = off
mmap = on
fit = on
fit-ctx = 32000
fit-target = 4096
jinja = true
cache-ram = 8192
cache-type-k = q8_0
cache-type-v = q8_0
cache-reuse = 256
log-disable = off
cpu-moe = true
context-shift = true
sleep-idle-seconds = 300
verbose = off
flash-attn = on
parallel = 1
models-max = 2
kv-unified = on
# swa-checkpoints=16
# ctx-checkpoints=16
# checkpoint-every-n-tokens=256

[default:latest]
hf = unsloth/Qwen3.5-35B-A3B-GGUF:Q4_K_M
no-mmproj = true

chat-template-kwargs={"enable_thinking":false}

temperature=0.7
top-p=0.8
top-k=20
min-p=0.0
presence-penalty=1.5
repeat-penalty=1.0
	
hfd = unsloth/Qwen3.5-0.8B-GGUF:Q4_K_M
# ctkd = q8_0
# ctvd = q8_0
cd = 32000
draft-p-min = .75

adhusch · 2026-03-16T14:30:48Z

Just a side-note: I think this PR shoud adress the issue for Nemotron 3 models too, could be worth including them in testing, showing that the solution is indeed general and not Qwen specific?

Add native MTP support for the dense Qwen 3.5 architecture (0.8B, 2B, 4B, 9B, 27B). What works: - MTP graph builder for dense qwen35 (build_mtp_head in qwen35.cpp) - MTP tensor loading and registration for QWEN35 arch - GGUF converter handles MTP tensors (mtp.fc, mtp.layers, mtp.norm, etc.) - Public API: llama_get_mtp_logits(), llama_model_n_mtp_layers() - Server auto-detects MTP from GGUF metadata - Speculative state machine for MTP draft token generation - PR ggml-org#20075 applied: recurrent state checkpoint/restore for hybrid models - M-RoPE position check relaxed for speculative re-evaluation - Windows os.kill fix for gateway process detection What needs work: - Speculative verify loop conflicts with tool-calling requests (400 error) - The recommended fix: bypass the speculative framework entirely and implement MTP acceptance directly in the server generation loop (no seq_rm/rollback needed since MTP drafts are produced in-graph) - MTP attention skipped (projection + FFN path only) due to inp_out_ids token count mismatch Tested on: RTX 5060 8GB, Windows 11, CUDA 13.2 Model: Qwen3.5-9B with MTP tensors (Q4_K_M quantization) Base: llama.cpp b8388

Implements recurrent state checkpointing for Qwen3.5 hybrid attention+SSM architecture, enabling speculative decoding that was previously broken due to SSM layers not supporting partial sequence removal. Upstream PR: ggml-org#20075

The recurrent memory was sized to n_seq_max (typically 1 for single sequence), leaving no room for the checkpoint cells that PR ggml-org#20075's seq_rm rollback needs. When speculative decoding is enabled, scale the buffer by 9x (1 current + 8 checkpoint slots per sequence).

Rockbob89 · 2026-03-19T16:29:16Z

ROCm (gfx1151) — crashes in copy_cell()

Tested: Qwen3.5-35B-A3B + Qwen3.5-0.8B draft, llama-server -np 1 -c 8192, b8420 + this PR cherry-picked.

Two issues found:

Compat check is a chicken-and-egg problem

common_speculative_is_compat() in speculative.cpp tests seq_rm() on a fresh context — but checkpoints only get created during normal decoding in find_slot(). So the test always fails for recurrent models. Had to bypass it manually to get further.

rs_size too small for checkpoints

llama-model.cpp sets recurrent_rs_size = max(1, n_seq_max). With -np 1 that's 1 cell. The checkpoint logic wants up to 8 per sequence — nowhere to put them.

copy_cell() gets called with next_empty_cell beyond buffer bounds:
ggml.c:1725: GGML_ASSERT(view_src == NULL || data_size == 0 || data_size + view_offs <= ggml_nbytes(view_src))
copy_cell() → ggml_view_1d → ASSERT
find_slot() → copy_cell()
prepare() → find_slot()

Tried bumping rs_size to n_seq_max + 4 (both paths in llama-model.cpp:8085 and :8104) — fixes the OOB but then r_l/s_l tensors aren't backed by a large enough backend buffer → GGML_ASSERT(buffer) in ggml-backend.cpp:194.

Same env as @stephensrmmartin who reported the partial sequence removal error above. All existing testers seem to be on Metal/CUDA — might handle buffer bounds differently?

fix: implement synchronous recurrent state checkpointing for hybrid m…

04e2fb1

…odels

eauchs requested a review from ggerganov as a code owner March 3, 2026 14:57

eauchs changed the title ~~fix: implement synchronous recurrent state checkpointing for hybrid m…~~ speculative : implement synchronous recurrent state checkpointing for hybrid m… Mar 3, 2026

eauchs changed the title ~~speculative : implement synchronous recurrent state checkpointing for hybrid m…~~ fix: Checkpoint/Restore mechanism for llama_memory_recurrent to enable speculative decoding on hybrid SSM/MoE models (Qwen 3.5 MoE) Mar 3, 2026

eauchs changed the title ~~fix: Checkpoint/Restore mechanism for llama_memory_recurrent to enable speculative decoding on hybrid SSM/MoE models (Qwen 3.5 MoE)~~ fix: speculative decoding broken on hybrid SSM/MoE (Qwen3.5 MoE) Mar 3, 2026

ArthurianX mentioned this pull request Mar 4, 2026

Feature Request: Speculative decoding on Qwen3.5 #20039

Open

4 tasks

eauchs and others added 2 commits March 5, 2026 11:11

Revert "fix: replace soft rollback with proper failure in recurrent s…

8fe0e03

…eq_rm" This reverts commit 9a04ac4.

l0nedigit mentioned this pull request Mar 6, 2026

Qwen3.5 dense models should be compatible draft models for Qwen3.5 MoE speculative decoding lmstudio-ai/lmstudio-bug-tracker#1597

Open

krystophny mentioned this pull request Mar 11, 2026

server: fix multi-turn cache reuse for hybrid/recurrent models #20428

Closed

itigges22 mentioned this pull request Mar 17, 2026

feat: MTP support for dense Qwen 3.5 with FastMTP vocabulary trimming #20700

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: speculative decoding broken on hybrid SSM/MoE (Qwen3.5 MoE)#20075

fix: speculative decoding broken on hybrid SSM/MoE (Qwen3.5 MoE)#20075
eauchs wants to merge 4 commits intoggml-org:masterfrom
eauchs:feat/qwen-moe-speculative-decoding

eauchs commented Mar 3, 2026 •

edited

Loading

Uh oh!

FatheredPuma81 commented Mar 4, 2026 •

edited

Loading

Uh oh!

eauchs commented Mar 4, 2026 •

edited

Loading

Uh oh!

FatheredPuma81 commented Mar 5, 2026 •

edited

Loading

Uh oh!

FatheredPuma81 commented Mar 5, 2026 •

edited

Loading

Uh oh!

eauchs commented Mar 5, 2026

Uh oh!

stephensrmmartin commented Mar 9, 2026

Uh oh!

adhusch commented Mar 16, 2026 •

edited

Loading

Uh oh!

Rockbob89 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

eauchs commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FatheredPuma81 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eauchs commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FatheredPuma81 commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FatheredPuma81 commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eauchs commented Mar 5, 2026

Uh oh!

stephensrmmartin commented Mar 9, 2026

Uh oh!

adhusch commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rockbob89 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

eauchs commented Mar 3, 2026 •

edited

Loading

FatheredPuma81 commented Mar 4, 2026 •

edited

Loading

eauchs commented Mar 4, 2026 •

edited

Loading

FatheredPuma81 commented Mar 5, 2026 •

edited

Loading

FatheredPuma81 commented Mar 5, 2026 •

edited

Loading

adhusch commented Mar 16, 2026 •

edited

Loading