Fix llama.cpp latent steps: request embeddings on every step

SStas · claude · SStas · commit 768dee14b59d · 2026-04-03T08:24:03.000-07:00
Setting logits[0]=1 only on the last step prevented
llama_get_embeddings_ith from returning hidden states on
intermediate steps, so the same initial hidden state was
re-injected N-1 times instead of iteratively refining.
Now matches the HuggingFace connector and avp-agent behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/src/avp/connectors/llamacpp.py b/src/avp/connectors/llamacpp.py
@@ -241,7 +241,7 @@ def think(
                 emb_batch.pos[0] = n_past
                 emb_batch.seq_id[0][0] = 0
                 emb_batch.n_seq_id[0] = 1
-                emb_batch.logits[0] = 1 if step == steps - 1 else 0
+                emb_batch.logits[0] = 1
 
                 rc = lc.llama_decode(think_ctx, emb_batch)
                 if rc != 0: