✨ vllm support for 0.11.1 release #546

joerunde · 2025-10-28T18:12:09Z

Description

Upgrades vllm to 0.11.1, adding backwards compatibility code where necessary.

This PR:

Updates the default vllm install to 0.11.1
Retains the lower bound of 0.10.2
Adds a new entry in the backwards compatibility tests to maintain test coverage of 0.11.0
Changes the uv.lock settings to install vllm from source instead of from cuda wheels
Bumps fms-mo to a dev version past 0.7.0 because 0.7.0 has a bug when running on cpu
Cleans out the GHA runner before image builds since we were running out of disk space

There was one really fun change here where the type of sampled_token_ids changed, but was then changed back for 0.12.0.

TODO: maybe we should get a new fms-mo release first

Signed-off-by: Joe Runde <[email protected]>

github-actions · 2025-10-28T18:12:20Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Joe Runde <[email protected]>

joerunde · 2025-12-04T22:36:24Z

pyproject.toml

 ]

+[tool.uv.sources]
+vllm = { git = "https://github.com/vllm-project/vllm", rev = "v0.11.1" }


Installing vllm this way (with VLLM_TARGET_DEVICE=empty) leaves out extra cuda-only dependencies from the uv.lock, since the published vllm wheels on pypi are only built for cuda.

Signed-off-by: Joe Runde <[email protected]>

tjohnson31415

Somehow you make backwards compatiblity elegant

tjohnson31415 · 2025-12-05T17:56:34Z

tests/e2e/test_chunked_prefill_tkv_steps.py

+    extra_args = {}
+    if "structured_output_request_ids" in dataclass_fields(SchedulerOutput):
+        extra_args["structured_output_request_ids"] = {}
+    if "grammar_bitmask" in dataclass_fields(SchedulerOutput):
+        extra_args["grammar_bitmask"] = None


It looks like we could just import and use _get_extra_args() from the spyre_worker to reduce code duplication.

private imports!!

but yeah, for a test file that's probably fine

tjohnson31415 · 2025-12-05T17:58:10Z

vllm_spyre/platform.py

    ) -> None:
        """Raises if this request is unsupported on this platform"""
+
+        # TODO: fix


Is this a TODO for this PR to fix before merging?

oh- maybe 🤔

I think I put the TODO in because the lazy import was suuuper ugly, but I do think the import has to stay lazy or we'll hit a circular import :(. The TODO here might be to just remove the TODO and replace with a comment about why this is the way it is

tjohnson31415 · 2025-12-05T18:33:58Z

TODO: There is still a problem with running qunatized models. I'm not sure what's going on there, as neither the torch version nor modeling code changed, but we're getting an error from torch

This PR bumps fms-model-optimizer to 0.7.0 in uv.lock. I confirmed the quantized model tests fail after upgrading 0.6.0 -> 0.7.0. Installing fms-mo from main resolved the torch error in my dev pod.

Signed-off-by: Joe Runde <[email protected]>

joerunde · 2025-12-06T00:03:52Z

Alright @tjohnson31415, looks like we are 🟢 for now. Thanks for the fms-mo hint, I validated that fms-mo 0.7.0 still works on spyre and it's just the cpu execution that's broken. I've bumped here to the latest main commit, which also appears to work fine on spyre too.

Let's talk on Monday- maybe we should get a new official fms-mo release instead of pinning a commit, and then I'm not entirely sure with our current release cadence whether we'd want to bump the actual vllm install to 0.11.1 or flip this around and just add a compatibility test for 0.11.1 and keep the uv.lock at 0.11.0. Then either way we should get the currently-good set of spyre unit tests run on this before merging

yannicks1

lgtm!

yannicks1 · 2025-12-08T10:20:41Z

vllm_spyre/v1/worker/spyre_worker.py

-        cached_request_data = CachedRequestData(
-            req_ids=req_ids,
-            resumed_from_preemption=False,
-            new_token_ids=new_token_ids,
-            new_block_ids=new_block_ids,
-            num_computed_tokens=num_computed_tokens,
-        )
+        cached_request_data = CachedRequestData.make_empty()
+        cached_request_data.req_ids = req_ids
+        cached_request_data.new_block_ids = new_block_ids
+        cached_request_data.new_token_ids = new_token_ids
+        cached_request_data.num_computed_tokens = num_computed_tokens


just for my understanding: what is the motivation for this change?

More fields were added to this dataclass, so normally we would have to do the checks on dartaclass_fields to inject empty kwargs into the initializer call here. But, this class offers a make_empty() that initializes everything with default values. So we use that instead and then only set the values that we care about, that way we don't have any backwards compatibility cleanup to worry about later.

yannicks1 · 2025-12-08T10:23:03Z

we have one last error to fix for CP:
TypeError: vllm.v1.core.sched.output.SchedulerOutput() got multiple values for keyword argument 'free_encoder_mm_hashes'

Signed-off-by: Yannick Schnider <[email protected]>

yannicks1 · 2025-12-08T12:36:14Z

@joerunde I fixed the failing test. hope you don't mind that I pushed to your branch, but thought it saves us some GHA time and you can hit merge as soon as you wake up:)

joerunde · 2025-12-08T15:34:09Z

Thanks @yannicks1!

joerunde · 2025-12-08T16:42:52Z

bot:test
MARKERS="spyre and cb and not multi"

joerunde · 2025-12-08T16:51:04Z

bot:test
MARKERS="spyre and cb and not multi"

tjohnson31415

LGTM. Yeah let's chat about the vLLM pin and the fms-mo release

joerunde · 2025-12-08T18:53:44Z

bot:test
MARKERS="spyre and chunked_prefill and not multi"

maxdebayser

LGTM. We can merge once the tests pass.

Signed-off-by: Joe Runde <[email protected]>

joerunde · 2025-12-09T16:37:52Z

bot:test
MARKERS="spyre and cb and not multi"

Signed-off-by: Joe Runde <[email protected]>

joerunde · 2025-12-09T16:58:50Z

bot:test
MARKERS="spyre and cb and not multi"

joerunde · 2025-12-09T17:05:39Z

bot:test
MARKERS="spyre and cb and not multi"

joerunde · 2025-12-09T18:07:09Z

The continuous batching tests passed on our bot test, and I was able to get the chunked prefill tests working on a dev pod. (The graph comparison tests still fail on chunked prefill with an old version of aftu on bot:test runs 😢 )

Test results

$ pytest tests -m "spyre and not quantized and not multi and chunked_prefill" -x -v --forked -k "not aftu"
===================================================================== test session starts =====================================================================
platform linux -- Python 3.12.9, pytest-8.3.4, pluggy-1.6.0 -- /home/senuser/repo/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/senuser/repo
configfile: pyproject.toml
plugins: forked-1.6.0, timeout-2.3.1, mock-3.15.1, asyncio-1.2.0, anyio-4.11.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=function, asyncio_default_test_loop_scope=function
collected 811 items / 784 deselected / 27 selected                                                                                                            

tests/e2e/test_logits_processors.py::test_custom_logits_processor[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-max_model_len(512)-max_num_seqs(4)-cp] PASSED [  3%]
tests/e2e/test_spyre_async_llm.py::test_abort[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-max_model_len(512)-max_num_seqs(4)-max_num_batched_tokens(128)-cp-RequestOutputKind.DELTA] PASSED [  7%]
tests/e2e/test_spyre_async_llm.py::test_abort[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-max_model_len(512)-max_num_seqs(4)-max_num_batched_tokens(128)-cp-RequestOutputKind.FINAL_ONLY] PASSED [ 11%]
tests/e2e/test_spyre_basic.py::test_max_model_len_override[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-cp] PASSED     [ 14%]
tests/e2e/test_spyre_cp_scheduler_steps.py::test_prefill_tkv_too_big[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-None-128-128-2] PASSED            [ 18%]
tests/e2e/test_spyre_cp_scheduler_steps.py::test_requests_exceed_batch_tkv_limit[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-None-128-128-2] PASSED [ 22%]
tests/e2e/test_spyre_cp_scheduler_steps.py::test_prefill_use_more_than_available_blocks[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-2-128-128-2] PASSED [ 25%]
tests/e2e/test_spyre_cp_scheduler_steps.py::test_single_cp_prefill[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-None-128-514-2] SKIPPED (sendnn...) [ 29%]
tests/e2e/test_spyre_cp_scheduler_steps.py::test_single_cp_prefill[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-None-128-1024-2] PASSED             [ 33%]
tests/e2e/test_spyre_cp_scheduler_steps.py::test_cp_prefill_interleave1[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-None-128-2048-2] PASSED        [ 37%]
tests/e2e/test_spyre_cp_scheduler_steps.py::test_cp_prefill_no_interleave[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-None-128-2048-2] PASSED      [ 40%]
tests/e2e/test_spyre_cp_scheduler_steps.py::test_cp_prefill_interleave2[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-None-128-2048-2] PASSED        [ 44%]
tests/e2e/test_chunked_prefill.py::test_chunked_prefill_correctness[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-max_model_len(512)-max_num_seqs(4)-case_Ia] PASSED [ 48%]
tests/e2e/test_chunked_prefill.py::test_chunked_prefill_correctness[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-max_model_len(512)-max_num_seqs(4)-case_Ib] PASSED [ 51%]
tests/e2e/test_chunked_prefill.py::test_chunked_prefill_correctness[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-max_model_len(512)-max_num_seqs(4)-case_II] PASSED [ 55%]
tests/e2e/test_chunked_prefill.py::test_chunked_prefill_correctness[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-max_model_len(512)-max_num_seqs(4)-case_III] PASSED [ 59%]
tests/e2e/test_sampling_params.py::test_spyre_batch1_logit_bias[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-max_model_len(512)-max_num_seqs(4)-cp] PASSED [ 62%]
tests/e2e/test_sampling_params.py::test_spyre_batch1_min_tokens[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-max_model_len(512)-max_num_seqs(4)-cp] PASSED [ 66%]
tests/e2e/test_sampling_params.py::test_spyre_batch1_min_p[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-max_model_len(512)-max_num_seqs(4)-cp] PASSED [ 70%]
tests/e2e/test_spyre_basic.py::test_output[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-max_model_len(512)-max_num_seqs(4)-cp-TP(1)] PASSED [ 74%]
tests/e2e/test_spyre_basic.py::test_batch_handling[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-max_model_len(512)-max_num_seqs(4)-cp] PASSED [ 77%]
tests/e2e/test_spyre_max_new_tokens.py::test_output[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-max_model_len(512)-max_num_seqs(4)-cp-True] PASSED [ 81%]
tests/e2e/test_spyre_max_new_tokens.py::test_output[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-max_model_len(512)-max_num_seqs(4)-cp-False] PASSED [ 85%]
tests/e2e/test_spyre_seed.py::test_seed[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-max_model_len(512)-max_num_seqs(4)-cp-42-0.1] XFAIL [ 88%]
tests/e2e/test_spyre_seed.py::test_seed[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-max_model_len(512)-max_num_seqs(4)-cp-42-1.0] XFAIL [ 92%]
tests/e2e/test_spyre_stagger_basic.py::test_stagger_output[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-max_model_len(512)-max_num_seqs(4)-cp-TP(1)] PASSED [ 96%]
tests/e2e/test_spyre_online.py::test_openai_serving[ibm-ai-platform/micro-g3.3-8b-instruct-1b-sendnn-warmup_shapes([(64, 20, 4)])-max_model_len(512)-max_num_seqs(4)-cp-TP(1)] PASSED [100%]

===================================== 24 passed, 1 skipped, 784 deselected, 2 xfailed, 60 warnings in 1214.60s (0:20:14) ======================================

tjohnson31415 · 2025-12-09T19:19:09Z

vllm_spyre/model_executor/model_loader/spyre.py

+
+        print("\n\n\n\n\t\tNUM BLOCKS:", num_blocks)
+        print("\t\tBLOCK SIZE:", self.kv_cache_specs['block_size'])
+        print("\t\tNUM KV HEADS:", self.kv_cache_specs['num_kv_heads'])
+        print("\t\tHEAD DIM:", self.kv_cache_specs['head_dim'])


Suggested change

print("\n\n\n\n\t\tNUM BLOCKS:", num_blocks)

print("\t\tBLOCK SIZE:", self.kv_cache_specs['block_size'])

print("\t\tNUM KV HEADS:", self.kv_cache_specs['num_kv_heads'])

print("\t\tHEAD DIM:", self.kv_cache_specs['head_dim'])

joerunde added 2 commits October 27, 2025 17:31

⚗️ start to get main working again

840187a

Signed-off-by: Joe Runde <[email protected]>

🐛 fixup more compatibility issues

5e30b53

Signed-off-by: Joe Runde <[email protected]>

joerunde added 3 commits October 28, 2025 12:20

🔇 noqa

86eed8f

Signed-off-by: Joe Runde <[email protected]>

🧪 add compatibility tests

091e9fd

Signed-off-by: Joe Runde <[email protected]>

🐛 fixup ignore comment

81ab41a

Signed-off-by: Joe Runde <[email protected]>

joerunde added the ready Runs the full CI test suite. Only add to PRs once ready to merge to limit public GHA usage label Oct 28, 2025

joerunde changed the title ~~✨ vllm main support for upcoming 0.111.1 release~~ ✨ vllm main support for upcoming 0.11.1 release Nov 4, 2025

joerunde added 3 commits November 4, 2025 11:37

Merge branch 'main' into 0.11.1-support

8d46d02

🐛 fixup circular import and utils

45a96b7

Signed-off-by: Joe Runde <[email protected]>

🐛 fixup grammar bitmask bit

4dedbfb

Signed-off-by: Joe Runde <[email protected]>

joerunde marked this pull request as ready for review November 5, 2025 02:50

joerunde requested review from nikolaospapandreou, prashantgupta24, rafvasq, sducouedic, tdoublep and yannicks1 as code owners November 5, 2025 02:50

joerunde added 2 commits December 4, 2025 12:56

Merge branch 'main' into 0.11.1-support

926b6f5

⬆️ bump vllm to 0.11.1

0f5adf3

Signed-off-by: Joe Runde <[email protected]>

joerunde requested a review from ckadner as a code owner December 4, 2025 20:29

joerunde added 3 commits December 4, 2025 13:30

Merge branch 'main' into 0.11.1-support

5d1deda

🐛 fixup compatibility for list of ndarrays

f53ed52

Signed-off-by: Joe Runde <[email protected]>

🔥 remove intel pytorch extension

1987279

Signed-off-by: Joe Runde <[email protected]>

joerunde removed the ready Runs the full CI test suite. Only add to PRs once ready to merge to limit public GHA usage label Dec 4, 2025

🐛 add return

917d24a

Signed-off-by: Joe Runde <[email protected]>

joerunde commented Dec 4, 2025

View reviewed changes

joerunde added 3 commits December 5, 2025 07:06

🐛 update tests for forwards compat

85f4442

Signed-off-by: Joe Runde <[email protected]>

Merge branch 'main' into 0.11.1-support

e45fb23

🐛 fixup test asserts

ed3b79f

Signed-off-by: Joe Runde <[email protected]>

joerunde changed the title ~~✨ vllm main support for upcoming 0.11.1 release~~ ✨ vllm support for 0.11.1 release Dec 5, 2025

tjohnson31415 reviewed Dec 5, 2025

View reviewed changes

joerunde added 4 commits December 5, 2025 16:34

🔥 clean up disk space for docker build

3f281de

Signed-off-by: Joe Runde <[email protected]>

🐛 use dev release of fms-mo for cpu tests

8c3a004

Signed-off-by: Joe Runde <[email protected]>

🐛 actually add actions file

3f02e01

Signed-off-by: Joe Runde <[email protected]>

🎨 cleanup from code review

556a16e

Signed-off-by: Joe Runde <[email protected]>

yannicks1 reviewed Dec 8, 2025

View reviewed changes

fix CP tests: remove duplicated arg

95477c0

Signed-off-by: Yannick Schnider <[email protected]>

tjohnson31415 approved these changes Dec 8, 2025

View reviewed changes

maxdebayser approved these changes Dec 8, 2025

View reviewed changes

joerunde added 2 commits December 9, 2025 08:58

⬆️ bump to fms-mo 0.8

6b5ac00

Signed-off-by: Joe Runde <[email protected]>

🐛 use max_model_len from model config

f3e1525

Signed-off-by: Joe Runde <[email protected]>

🐛 add model config to scheduler

963bce5

Signed-off-by: Joe Runde <[email protected]>

joerunde merged commit e834cc7 into main Dec 9, 2025
22 checks passed

joerunde deleted the 0.11.1-support branch December 9, 2025 18:42

tjohnson31415 reviewed Dec 9, 2025

View reviewed changes

✨ vllm support for 0.11.1 release #546

✨ vllm support for 0.11.1 release #546

Uh oh!

Conversation

joerunde commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

github-actions bot commented Oct 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjohnson31415 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjohnson31415 commented Dec 5, 2025

Uh oh!

joerunde commented Dec 6, 2025

Uh oh!

yannicks1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yannicks1 commented Dec 8, 2025

Uh oh!

yannicks1 commented Dec 8, 2025

Uh oh!

joerunde commented Dec 8, 2025

Uh oh!

joerunde commented Dec 8, 2025

Uh oh!

joerunde commented Dec 8, 2025

Uh oh!

tjohnson31415 left a comment

Choose a reason for hiding this comment

Uh oh!

joerunde commented Dec 8, 2025

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

joerunde commented Dec 9, 2025

Uh oh!

joerunde commented Dec 9, 2025

Uh oh!

joerunde commented Dec 9, 2025

Uh oh!

joerunde commented Dec 9, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

joerunde commented Oct 28, 2025 •

edited

Loading