feat: improve golden token injection #540

maxdebayser · 2025-10-21T01:37:07Z

A rebase of @wallashss's PR #536 without the LogitsProcessor refactors.

Improvements to the GoldenTokenInjector (GTI):

update GTI args to support parsing from JSON string, eg. passed through the vLLM OpenAI vllm_xargs extension
support always injecting the golden token regardless of expected logprobs / divergence checks
GTI is now part of the vllm_spyre package; it can be enabled with: --logits-processors vllm_spyre.v1.sample.golden_token_injector:GoldenTokenInjector

Example usage:

VLLM_SPYRE_USE_CB=1 vllm serve ibm-ai-platform/micro-g3.3-8b-instruct-1b --max-num-seqs 16 --max-model-len 1024 --logits-processors vllm_spyre.v1.sample.golden_token_injector:GoldenTokenInjector

curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
  "model": "ibm-ai-platform/micro-g3.3-8b-instruct-1b",
  "prompt": "a squared plus b squared equals ",
  "vllm_xargs": {
     "golden_token_injector": "{\"expected_token_ids\": [85, 39560]}"
  },
  "logprobs": 0,
  "return_tokens_as_token_ids": true,
  "max_tokens": 2,
  "temperature": 0
}'

to generate "c squared" instead of "16".

Signed-off-by: Max de Bayser <[email protected]>

github-actions · 2025-10-21T01:40:37Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Travis Johnson <[email protected]>

tjohnson31415 · 2025-10-21T23:41:27Z

bot:test

tjohnson31415 · 2025-10-22T20:20:55Z

tests/e2e/test_spyre_max_new_tokens.py

-    vllm_sampling_params = [vllm_sampling_params_normal] * 3
+    vllm_sampling_params = [
+        vllm_sampling_params_normal.clone() for _ in range(3)
+    ]


This change is needed now that all tests use GTI by default (before this PR, only tests using get_engine would use GTI). Having a copied reference instead of a .clone() meant that all sequences had the same GTI config (even with different prompts).

…provements_main

Signed-off-by: Wallas Santos <[email protected]>

wallashss · 2025-10-28T19:19:19Z

bot:test
MARKERS="spyre and cb and not quantized and not multi"

waleedqk · 2025-10-28T20:20:34Z

bot:test
MARKERS="spyre and cb and not quantized and not multi"

waleedqk · 2025-10-28T20:27:15Z

bot:test
MARKERS="spyre and cb and not quantized and not multi"

vllm_spyre/v1/sample/golden_token_injector.py

Signed-off-by: Wallas Santos <[email protected]>

Co-authored-by: Travis Johnson <[email protected]> Signed-off-by: Wallas Henrique <[email protected]>

Signed-off-by: Wallas Santos <[email protected]>

wallashss · 2025-10-28T21:09:42Z

@tjohnson31415 tests included!

tjohnson31415

LGTM lots of improvements and some tests too!!

wallashss · 2025-10-28T21:58:28Z

bot:test
MARKERS="spyre and cb and not multi"

wallashss · 2025-10-29T15:16:26Z

bot:test

wallashss · 2025-10-29T17:03:07Z

bot:test
TEST_FILE=tests/v1/worker/test_spyre_input_batch.py

wallashss · 2025-10-29T17:07:32Z

bot:test
TEST_FILE=tests/v1/worker/test_spyre_input_batch.py MARKERS=cpu

rebase GTI improvements on main

230977a

Signed-off-by: Max de Bayser <[email protected]>

tjohnson31415 added 3 commits October 21, 2025 10:02

fix: advance current_token_idx when always inject

eef723d

Signed-off-by: Travis Johnson <[email protected]>

fix: add default None for optional args

9040ace

Signed-off-by: Travis Johnson <[email protected]>

fix: expand get_tokenizer params

d151007

Signed-off-by: Travis Johnson <[email protected]>

tjohnson31415 force-pushed the gti_improvements_main branch from 42ea36b to d151007 Compare October 21, 2025 19:20

fix tests now that GTI is always used

1338b41

Signed-off-by: Travis Johnson <[email protected]>

tjohnson31415 force-pushed the gti_improvements_main branch from 401b3dc to 1338b41 Compare October 21, 2025 22:17

GTI by default only in the tests

8abd342

Signed-off-by: Travis Johnson <[email protected]>

tjohnson31415 changed the title ~~rebase GTI improvements on main~~ feat: improve golden token injection Oct 22, 2025

tjohnson31415 reviewed Oct 22, 2025

View reviewed changes

wallashss added 2 commits October 28, 2025 14:54

Merge branch 'main' of github.com:vllm-project/vllm-spyre into gti_im…

0f886ae

…provements_main

test: removed unrelated test

e244be6

Signed-off-by: Wallas Santos <[email protected]>

wallashss mentioned this pull request Oct 28, 2025

feat: improve golden token injection #536

Closed

refact: removed current_token_idx to use output token ids

d1c7a91

Signed-off-by: Wallas Santos <[email protected]>

wallashss marked this pull request as ready for review October 28, 2025 19:18

wallashss requested review from nikolaospapandreou, prashantgupta24, rafvasq, sducouedic, tdoublep and yannicks1 as code owners October 28, 2025 19:18

tjohnson31415 reviewed Oct 28, 2025

View reviewed changes

vllm_spyre/v1/sample/golden_token_injector.py Show resolved Hide resolved

wallashss and others added 3 commits October 28, 2025 18:03

test: basic tests

d6b3273

Signed-off-by: Wallas Santos <[email protected]>

Update vllm_spyre/v1/sample/golden_token_injector.py

b40229e

Co-authored-by: Travis Johnson <[email protected]> Signed-off-by: Wallas Henrique <[email protected]>

style: fix linting

4c5402c

Signed-off-by: Wallas Santos <[email protected]>

tjohnson31415 approved these changes Oct 28, 2025

View reviewed changes

wallashss enabled auto-merge (squash) October 28, 2025 22:00

github-actions bot added the ready Runs the full CI test suite. Only add to PRs once ready to merge to limit public GHA usage label Oct 28, 2025

wallashss merged commit 81a1031 into main Oct 28, 2025
31 of 40 checks passed

wallashss deleted the gti_improvements_main branch October 28, 2025 22:34

feat: improve golden token injection #540

feat: improve golden token injection #540

Uh oh!

Conversation

maxdebayser commented Oct 21, 2025 • edited by tjohnson31415 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

tjohnson31415 commented Oct 21, 2025

Uh oh!

tjohnson31415 Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wallashss commented Oct 28, 2025

Uh oh!

waleedqk commented Oct 28, 2025

Uh oh!

waleedqk commented Oct 28, 2025

Uh oh!

Uh oh!

wallashss commented Oct 28, 2025

Uh oh!

tjohnson31415 left a comment

Choose a reason for hiding this comment

Uh oh!

wallashss commented Oct 28, 2025

Uh oh!

Uh oh!

wallashss commented Oct 29, 2025

Uh oh!

wallashss commented Oct 29, 2025

Uh oh!

wallashss commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

maxdebayser commented Oct 21, 2025 •

edited by tjohnson31415

Loading

tjohnson31415 Oct 22, 2025 •

edited

Loading