Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
840187a
:alembic: start to get main working again
joerunde Oct 27, 2025
5e30b53
:bug: fixup more compatibility issues
joerunde Oct 28, 2025
86eed8f
:mute: noqa
joerunde Oct 28, 2025
091e9fd
:test_tube: add compatibility tests
joerunde Oct 28, 2025
81ab41a
:bug: fixup ignore comment
joerunde Oct 28, 2025
8d46d02
Merge branch 'main' into 0.11.1-support
joerunde Nov 4, 2025
45a96b7
:bug: fixup circular import and utils
joerunde Nov 4, 2025
4dedbfb
:bug: fixup grammar bitmask bit
joerunde Nov 4, 2025
926b6f5
Merge branch 'main' into 0.11.1-support
joerunde Dec 4, 2025
0f5adf3
:arrow_up: bump vllm to 0.11.1
joerunde Dec 4, 2025
5d1deda
Merge branch 'main' into 0.11.1-support
joerunde Dec 4, 2025
f53ed52
:bug: fixup compatibility for list of ndarrays
joerunde Dec 4, 2025
1987279
:fire: remove intel pytorch extension
joerunde Dec 4, 2025
917d24a
:bug: add return
joerunde Dec 4, 2025
85f4442
:bug: update tests for forwards compat
joerunde Dec 5, 2025
e45fb23
Merge branch 'main' into 0.11.1-support
joerunde Dec 5, 2025
ed3b79f
:bug: fixup test asserts
joerunde Dec 5, 2025
3f281de
:fire: clean up disk space for docker build
joerunde Dec 5, 2025
8c3a004
:bug: use dev release of fms-mo for cpu tests
joerunde Dec 5, 2025
3f02e01
:bug: actually add actions file
joerunde Dec 5, 2025
556a16e
:art: cleanup from code review
joerunde Dec 5, 2025
95477c0
fix CP tests: remove duplicated arg
yannicks1 Dec 8, 2025
6b5ac00
:arrow_up: bump to fms-mo 0.8
joerunde Dec 9, 2025
f3e1525
:bug: use max_model_len from model config
joerunde Dec 9, 2025
963bce5
:bug: add model config to scheduler
joerunde Dec 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions .github/actions/free-up-disk-space/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: "Free up disk space"
description: "Removes non-essential tools, libraries and cached files from GitHub action runner node"

runs:
using: "composite"
steps:
- name: "Remove non-essential tools and libraries"
shell: bash
run: |
# https://github.com/actions/runner-images/issues/2840#issuecomment-790492173
echo "Disk usage before cleanup:"
df -h
echo "Removing non-essential tools and libraries ..."
sudo rm -rf /opt/ghc
sudo rm -rf /usr/local/.ghcup
sudo rm -rf /usr/share/dotnet
# sudo rm -rf /usr/local/share/boost
echo "Deleting libraries for Android (12G), CodeQL (5.3G), PowerShell (1.3G), Swift (1.7G) ..."
sudo rm -rf /usr/local/lib/android
sudo rm -rf "${AGENT_TOOLSDIRECTORY}/CodeQL"
sudo rm -rf /usr/local/share/powershell
sudo rm -rf /usr/share/swift
# ref: https://github.com/jlumbroso/free-disk-space/blob/main/action.yml
echo "Deleting some larger apt packages:"
sudo apt-get remove -y '^aspnetcore-.*' || echo "::warning::The command [sudo apt-get remove -y '^aspnetcore-.*'] failed to complete successfully. Proceeding..."
sudo apt-get remove -y '^dotnet-.*' --fix-missing || echo "::warning::The command [sudo apt-get remove -y '^dotnet-.*' --fix-missing] failed to complete successfully. Proceeding..."
sudo apt-get remove -y '^llvm-.*' --fix-missing || echo "::warning::The command [sudo apt-get remove -y '^llvm-.*' --fix-missing] failed to complete successfully. Proceeding..."
sudo apt-get remove -y 'php.*' --fix-missing || echo "::warning::The command [sudo apt-get remove -y 'php.*' --fix-missing] failed to complete successfully. Proceeding..."
sudo apt-get remove -y '^mongodb-.*' --fix-missing || echo "::warning::The command [sudo apt-get remove -y '^mongodb-.*' --fix-missing] failed to complete successfully. Proceeding..."
sudo apt-get remove -y '^mysql-.*' --fix-missing || echo "::warning::The command [sudo apt-get remove -y '^mysql-.*' --fix-missing] failed to complete successfully. Proceeding..."
sudo apt-get remove -y azure-cli google-chrome-stable firefox powershell mono-devel libgl1-mesa-dri --fix-missing || echo "::warning::The command [sudo apt-get remove -y azure-cli google-chrome-stable firefox powershell mono-devel libgl1-mesa-dri --fix-missing] failed to complete successfully. Proceeding..."
sudo apt-get remove -y google-cloud-sdk --fix-missing || echo "::debug::The command [sudo apt-get remove -y google-cloud-sdk --fix-missing] failed to complete successfully. Proceeding..."
sudo apt-get remove -y google-cloud-cli --fix-missing || echo "::debug::The command [sudo apt-get remove -y google-cloud-cli --fix-missing] failed to complete successfully. Proceeding..."
sudo apt-get autoremove -y || echo "::warning::The command [sudo apt-get autoremove -y] failed to complete successfully. Proceeding..."
sudo apt-get clean || echo "::warning::The command [sudo apt-get clean] failed to complete successfully. Proceeding..."
echo "Disk usage after cleanup:"
df -h

- name: "Prune docker images"
shell: bash
run: |
echo "Pruning docker images ..."
docker image prune -a -f
docker system df
echo "Disk usage after pruning docker images:"
df -h
3 changes: 3 additions & 0 deletions .github/workflows/build_docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,9 @@ jobs:
with:
fetch-depth: 0

- name: "Free up disk space"
uses: ./.github/actions/free-up-disk-space

- name: "Set up QEMU"
uses: docker/setup-qemu-action@v3

Expand Down
14 changes: 14 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ jobs:
markers: "precompilation and not quantized"
flags: "--timeout=300"
include:
# Lower bound support
- vllm_version:
name: "vLLM:lowest"
repo: "git+https://github.com/vllm-project/vllm --tag v0.10.2"
Expand All @@ -90,6 +91,19 @@ jobs:
hf_model_2_rev: "cf74d8acd4f198de950bf004b262e6accfed5d2c"
os: "ubuntu-latest"
python_version: "3.12"
# Intermediate versions of vllm to check basic support for as well
- vllm_version:
name: "vLLM:0.11.0"
repo: "git+https://github.com/vllm-project/vllm --tag v0.11.0"
test_suite:
name: "backward compat"
markers: "compat or (cpu and basic)"
flags: "--timeout=300"
hf_model_2: "sentence-transformers/all-roberta-large-v1"
hf_model_2_rev: "cf74d8acd4f198de950bf004b262e6accfed5d2c"
os: "ubuntu-latest"
python_version: "3.12"

# Exclude vLLM:main if PR does NOT have "ready" label AND auto-merge is not enabled
exclude: >-
${{
Expand Down
8 changes: 6 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ description = "vLLM plugin for Spyre hardware support"
readme = "README.md"
license = {text = "Apache 2"}
dependencies = [
"fms-model-optimizer[fp8]>=0.6.0",
"fms-model-optimizer[fp8]>=0.8.0",
"ibm-fms>=1.5.0,<2.0",
"vllm>=0.10.2,<=0.11.0",
"vllm>=0.10.2,<=0.11.1",
"pytest-mock>=3.15.0",
]
requires-python = ">=3.11"
Expand Down Expand Up @@ -50,6 +50,7 @@ override-dependencies = [
"torchaudio; sys_platform == 'never'",
"torchvision; sys_platform == 'never'",
"triton; sys_platform == 'never'",
"intel-extension-for-pytorch; sys_platform == 'never'",

# Skip packages on s390x and ppc64le, expected to be pre-installed
"vllm ; platform_machine not in 's390x, ppc64le'",
Expand All @@ -62,6 +63,9 @@ environments = [
"python_version > '3.9'"
]

[tool.uv.sources]
vllm = { git = "https://github.com/vllm-project/vllm", rev = "v0.11.1" }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing vllm this way (with VLLM_TARGET_DEVICE=empty) leaves out extra cuda-only dependencies from the uv.lock, since the published vllm wheels on pypi are only built for cuda.


[tool.ruff]
# Allow lines to be as long as 80.
line-length = 80
Expand Down
8 changes: 4 additions & 4 deletions tests/e2e/test_chunked_prefill_tkv_steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@

from vllm_spyre.platform import SpyrePlatform
from vllm_spyre.v1.worker.spyre_model_runner import SpyreModelRunner
from vllm_spyre.v1.worker.spyre_worker import _get_extra_args


########## Assuming that we have:
Expand Down Expand Up @@ -107,6 +108,7 @@ def make_scheduler_output(
if finished_req_ids is None:
finished_req_ids = set()

extra_args = _get_extra_args()
return SchedulerOutput(scheduled_new_reqs=scheduled_new_reqs,
scheduled_cached_reqs=scheduled_cached_reqs,
num_scheduled_tokens=num_scheduled_tokens,
Expand All @@ -115,10 +117,8 @@ def make_scheduler_output(
scheduled_encoder_inputs={},
num_common_prefix_blocks=[],
finished_req_ids=finished_req_ids,
free_encoder_mm_hashes=[],
structured_output_request_ids={},
grammar_bitmask=None,
kv_connector_metadata=None)
kv_connector_metadata=None,
**extra_args)


def make_new_request_data(req_id, prompt_len):
Expand Down
5 changes: 4 additions & 1 deletion tests/e2e/test_spyre_cb.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,10 @@ def test_cb_max_tokens(model: ModelInfo, backend: str, max_model_len: int,
ignore_eos=True,
logprobs=0)

with pytest.raises(ValueError, match="max model context length"):
# The text of the error raised by vllm changed from 0.11.0 to 0.11.1
with pytest.raises(
ValueError,
match="(max model context length|maximum model length)"):
generate_spyre_vllm_output(
model=model,
prompts=overflow_prompt,
Expand Down
2 changes: 1 addition & 1 deletion tests/llm_cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ def get_engine(

# Set scheduler configs for max_model_len and max_num_seqs to the
# original values. They were changed for more robust compilation only.
engine_core.scheduler.scheduler_config.max_model_len = max_model_len
engine_core.scheduler.model_config.max_model_len = max_model_len
engine_core.scheduler.scheduler_config.max_num_seqs = max_num_seqs

if available_blocks is not None:
Expand Down
10 changes: 9 additions & 1 deletion tests/spyre_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,15 @@
from transformers import AutoTokenizer
from vllm import SamplingParams
from vllm.entrypoints.openai.cli_args import make_arg_parser
from vllm.utils import FlexibleArgumentParser, get_open_port

try:
# old
from vllm.utils import FlexibleArgumentParser, get_open_port
except ImportError:
# new
from vllm.utils.argparse_utils import FlexibleArgumentParser
from vllm.utils.network_utils import get_open_port

from vllm.v1.request import Request

EmbeddingWarmupShapes = list[tuple[int, int]]
Expand Down
39 changes: 38 additions & 1 deletion tests/utils/test_upstream_compatibility.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import os

import pytest
from vllm.v1.core.sched.output import NewRequestData
from vllm.v1.core.sched.output import NewRequestData, SchedulerOutput
from vllm.v1.kv_cache_interface import FullAttentionSpec

from vllm_spyre.compat_utils import dataclass_fields
Expand Down Expand Up @@ -33,3 +33,40 @@ def test_use_mla():
if VLLM_VERSION == "vLLM:lowest":
# Can remove backwards compatibility for use_mla
assert "use_mla" in dataclass_fields(FullAttentionSpec)


def test_pin_memory_available():
if VLLM_VERSION == "vLLM:lowest":
try:
from vllm.utils import is_pin_memory_available # # noqa
from vllm.utils import make_tensor_with_pad # # noqa
except ImportError as e:
raise AssertionError("remove backwards compatibility imports for "
"is_pin_memory_available and "
"make_tensor_with_pad") from e


def test_multi_modal_cache_stats():
if VLLM_VERSION == "vLLM:lowest":
# If this import succeeds then remove the backwards compatibility type
# def for MultiModalCacheStats
with pytest.raises(ImportError):
from vllm.v1.metrics.stats import MultiModalCacheStats # # noqa


def test_v0_worker_base():
if VLLM_VERSION == "vLLM:lowest":
try:
from vllm.worker.worker_base import WorkerBase # # noqa
except ImportError as e:
raise AssertionError(
"remove the backwards compatibility code from "
"the SpyreWorker initializer") from e


def test_structured_output_request_ids():
if VLLM_VERSION == "vLLM:lowest":
# Can remove "structured_output_request_ids" and "grammar_bitmask"
# from backwards compat
assert 'structured_output_request_ids' in dataclass_fields(
SchedulerOutput)
8 changes: 7 additions & 1 deletion tests/v1/worker/test_spyre_input_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,13 @@
import pytest
import torch
from vllm.sampling_params import SamplingParams
from vllm.utils import is_pin_memory_available, make_tensor_with_pad

try:
from vllm.utils import is_pin_memory_available, make_tensor_with_pad
except ImportError:
from vllm.utils.platform_utils import is_pin_memory_available
from vllm.utils.torch_utils import make_tensor_with_pad

from vllm.v1.sample.logits_processor import LogitsProcessors
from vllm.v1.sample.metadata import SamplingMetadata

Expand Down
Loading