UPSTREAM PR #17808: server: improve speed of speculative decoding by loci-dev · Pull Request #463 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-06T00:48:51Z

I'm testing with llama-server --fim-qwen-7b-spec but it seems like the quality degraded significantly. Not sure if this is expected (as we no longer sample single token like before)

TODO: leave a drawing here to explain how it works

loci-review · 2025-12-06T01:31:13Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #463

Analysis Overview:
Comparison of version f4ec65a5-3bd7-43ad-b473-ceef01e93350 against base 6ffb5ba3-7159-4bcb-bcc4-0ff094d14c42 for the llama.cpp server speculative decoding optimization.

Summary

This PR refactors speculative decoding to batch draft tokens with the main model inference, eliminating a separate llama_decode() call per iteration. The analysis shows no measurable performance differences between versions, with all binaries reporting 0.0% power consumption change and no function-level Response Time or Throughput Time variations. The code changes are structurally sound, moving draft generation before batch construction and adding rollback logic for token management, but the optimization benefits are not captured in the static analysis environment.

server: improve speed of speculative decoding

f2f08f8

loci-dev had a problem deploying to PROD__AL_DEMO December 6, 2025 00:48 — with GitHub Actions Failure

loci-dev force-pushed the main branch 27 times, most recently from a2add8a to 6d9272a Compare December 9, 2025 09:10

loci-dev force-pushed the main branch 30 times, most recently from ef96f85 to adf9533 Compare December 14, 2025 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #17808: server: improve speed of speculative decoding#463

UPSTREAM PR #17808: server: improve speed of speculative decoding#463
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17808-branch_ngxson-xsn/server_improve_spec

loci-dev commented Dec 6, 2025

Uh oh!

loci-review bot commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Dec 6, 2025

Uh oh!

loci-review bot commented Dec 6, 2025

Performance Analysis Summary: PR #463

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants