-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Closed
Labels
Description
Anything you want to discuss about vllm.
I run the openvino install by
docker build -f Dockerfile.openvino -t vllm-openvino-env .
docker run -it --rm vllm-openvino-env
and start the code by
python3 -m vllm.entrypoints.openai.api_server --model {model_path}
The bug is shown below. Any help for this problem?
Traceback (most recent call last): [1/1922]
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/api_server.py", line 312, in <module>
asyncio.run(run_server(args))
File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/api_server.py", line 289, in run_server
app = await init_app(args, llm_engine)
File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/api_server.py", line 229, in init_app
if llm_engine is not None else AsyncLLMEngine.from_engine_args(
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 455, in from_engine_args
engine_config = engine_args.create_engine_config()
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/arg_utils.py", line 699, in create_engine_config
model_config = ModelConfig(
File "/usr/local/lib/python3.8/dist-packages/vllm/config.py", line 181, in __init__
self._verify_quantization()
File "/usr/local/lib/python3.8/dist-packages/vllm/config.py", line 218, in _verify_quantization
quantization_override = method.override_quantization_method(
File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/quantization/gptq_marlin.py", line 80, in override_quantization_method
can_convert = cls.is_gptq_marlin_compatible(hf_quant_cfg)
File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/quantization/gptq_marlin.py", line 125, in is_gptq_marlin_compatible
return check_gptq_marlin_supported(
File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/quantization/utils/marlin_utils.py", line 55, in check_gptq_marlin_supported
cond, _ = _check_marlin_supported(num_bits,
File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/quantization/utils/marlin_utils.py", line 29, in _check_marlin_supported
major, minor = current_platform.get_device_capability()
File "/usr/local/lib/python3.8/dist-packages/vllm/platforms/interface.py", line 28, in get_device_capability
raise NotImplementedError
NotImplementedError