Skip to content

Commit aca9493

Browse files
committed
Fix speculator model integration by detecting speculators before ModelConfig creation
When using 'vllm serve' with a speculator model path directly (e.g., RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3), the tokenizer loading was failing because ModelConfig was created with the speculator path before maybe_override_with_speculators() could swap it to the target model path. This fix moves the maybe_override_with_speculators() call to happen BEFORE create_model_config(), ensuring that: 1. Speculator models are detected early 2. The target model path is extracted from the speculators config 3. ModelConfig is created with the correct target model path 4. Tokenizer loads successfully from the target model Signed-off-by: Rahul Tuli <[email protected]>
1 parent f177da1 commit aca9493

File tree

1 file changed

+6
-4
lines changed

1 file changed

+6
-4
lines changed

vllm/engine/arg_utils.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1275,10 +1275,8 @@ def create_engine_config(
12751275

12761276
device_config = DeviceConfig(device=cast(Device, current_platform.device_type))
12771277

1278-
model_config = self.create_model_config()
1279-
self.model = model_config.model
1280-
self.tokenizer = model_config.tokenizer
1281-
1278+
# Check if the model is a speculator and override model/tokenizer/config
1279+
# BEFORE creating ModelConfig, so the config is created with the target model
12821280
(self.model, self.tokenizer, self.speculative_config) = (
12831281
maybe_override_with_speculators(
12841282
model=self.model,
@@ -1289,6 +1287,10 @@ def create_engine_config(
12891287
)
12901288
)
12911289

1290+
model_config = self.create_model_config()
1291+
self.model = model_config.model
1292+
self.tokenizer = model_config.tokenizer
1293+
12921294
# * If VLLM_USE_V1 is unset, we enable V1 for "supported features"
12931295
# and fall back to V0 for experimental or unsupported features.
12941296
# * If VLLM_USE_V1=1, we enable V1 for supported + experimental

0 commit comments

Comments
 (0)