A vLLM platform plugin for IBM Spyre AI accelerators.
spyre-inference is the next evolution of sendnn-inference, providing seamless integration of IBM's Spyre hardware accelerators with vLLM for high-performance large language model inference.
This plugin leverages torch-spyre to utilize PyTorch's native Inductor compiler backend, enabling optimized model execution on Spyre devices through vLLM's plugin architecture.
- Python >= 3.11
- Access to IBM Spyre hardware with the Spyre Runtime stack
- PyTorch 2.10.0 (CPU backend)
# Clone the repository
git clone https://github.com/torch-spyre/spyre-inference
cd spyre-inference
# Install with uv (recommended)
uv sync --frozenNote: torch-spyre compilation requires access to IBM Spyre hardware with the Spyre Runtime stack. See internal development documentation for environment setup.
The plugin automatically registers with vLLM when installed. Use it by setting `VLLM_PLUGINS=spyre_inference"
from vllm import LLM
llm = LLM(
model="ibm-ai-platform/micro-g3.3-8b-instruct-1b",
max_model_len=128,
max_num_seqs=2,
)The test suite includes:
- Local tests (
-m spyre) - Spyre-specific functionality validation - Upstream tests (
-m upstream) - vLLM compatibility verification
Upstream tests are automatically synced from the vLLM repository at the commit specified in pyproject.toml.
See Contributing Guide for:
- Issue reporting and feature requests
- Development setup
- Testing guidelines
- Pull request process
Apache 2.0
- torch-spyre - PyTorch backend for Spyre accelerators
- vLLM - High-throughput LLM inference engine
- sendnn-inference - Previous generation Spyre vLLM plugin