Skip to content

Conversation

@maxdebayser
Copy link
Collaborator

@maxdebayser maxdebayser commented Oct 21, 2025

A rebase of @wallashss's PR #536 without the LogitsProcessor refactors.

Improvements to the GoldenTokenInjector (GTI):

  • update GTI args to support parsing from JSON string, eg. passed through the vLLM OpenAI vllm_xargs extension
  • support always injecting the golden token regardless of expected logprobs / divergence checks
  • GTI is now part of the vllm_spyre package; it can be enabled with: --logits-processors vllm_spyre.v1.sample.golden_token_injector:GoldenTokenInjector

Example usage:

VLLM_SPYRE_USE_CB=1 vllm serve ibm-ai-platform/micro-g3.3-8b-instruct-1b --max-num-seqs 16 --max-model-len 1024 --logits-processors vllm_spyre.v1.sample.golden_token_injector:GoldenTokenInjector
curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
  "model": "ibm-ai-platform/micro-g3.3-8b-instruct-1b",
  "prompt": "a squared plus b squared equals ",
  "vllm_xargs": {
     "golden_token_injector": "{\"expected_token_ids\": [85, 39560]}"
  },
  "logprobs": 0,
  "return_tokens_as_token_ids": true,
  "max_tokens": 2,
  "temperature": 0
}'

to generate "c squared" instead of "16".

Signed-off-by: Max de Bayser <[email protected]>
@github-actions
Copy link

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Travis Johnson <[email protected]>
@tjohnson31415
Copy link
Collaborator

bot:test

@tjohnson31415 tjohnson31415 changed the title rebase GTI improvements on main feat: improve golden token injection Oct 22, 2025
vllm_sampling_params = [vllm_sampling_params_normal] * 3
vllm_sampling_params = [
vllm_sampling_params_normal.clone() for _ in range(3)
]
Copy link
Collaborator

@tjohnson31415 tjohnson31415 Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is needed now that all tests use GTI by default (before this PR, only tests using get_engine would use GTI). Having a copied reference instead of a .clone() meant that all sequences had the same GTI config (even with different prompts).

@wallashss
Copy link
Collaborator

bot:test
MARKERS="spyre and cb and not quantized and not multi"

2 similar comments
@waleedqk
Copy link
Collaborator

bot:test
MARKERS="spyre and cb and not quantized and not multi"

@waleedqk
Copy link
Collaborator

bot:test
MARKERS="spyre and cb and not quantized and not multi"

wallashss and others added 3 commits October 28, 2025 18:03
Signed-off-by: Wallas Santos <[email protected]>
Co-authored-by: Travis Johnson <[email protected]>
Signed-off-by: Wallas Henrique <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
@wallashss
Copy link
Collaborator

@tjohnson31415 tests included!

Copy link
Collaborator

@tjohnson31415 tjohnson31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM lots of improvements and some tests too!!

@wallashss
Copy link
Collaborator

bot:test
MARKERS="spyre and cb and not multi"

@wallashss wallashss enabled auto-merge (squash) October 28, 2025 22:00
@github-actions github-actions bot added the ready Runs the full CI test suite. Only add to PRs once ready to merge to limit public GHA usage label Oct 28, 2025
@wallashss wallashss merged commit 81a1031 into main Oct 28, 2025
31 of 40 checks passed
@wallashss wallashss deleted the gti_improvements_main branch October 28, 2025 22:34
@wallashss
Copy link
Collaborator

bot:test

@wallashss
Copy link
Collaborator

bot:test
TEST_FILE=tests/v1/worker/test_spyre_input_batch.py

@wallashss
Copy link
Collaborator

bot:test
TEST_FILE=tests/v1/worker/test_spyre_input_batch.py MARKERS=cpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready Runs the full CI test suite. Only add to PRs once ready to merge to limit public GHA usage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants