Skip to content

[Misc]: Problem about running with openvino #6898

@liuxingbin

Description

@liuxingbin

Anything you want to discuss about vllm.

I run the openvino install by

docker build -f Dockerfile.openvino -t vllm-openvino-env .
docker run -it --rm vllm-openvino-env

and start the code by

python3 -m vllm.entrypoints.openai.api_server --model {model_path}

The bug is shown below. Any help for this problem?

Traceback (most recent call last):                                                                                                                                                                     [1/1922]
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,    
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/api_server.py", line 312, in <module>
    asyncio.run(run_server(args))
  File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/api_server.py", line 289, in run_server
    app = await init_app(args, llm_engine)
  File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/openai/api_server.py", line 229, in init_app
    if llm_engine is not None else AsyncLLMEngine.from_engine_args(
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/async_llm_engine.py", line 455, in from_engine_args
    engine_config = engine_args.create_engine_config()
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/arg_utils.py", line 699, in create_engine_config
    model_config = ModelConfig(
  File "/usr/local/lib/python3.8/dist-packages/vllm/config.py", line 181, in __init__
    self._verify_quantization()
  File "/usr/local/lib/python3.8/dist-packages/vllm/config.py", line 218, in _verify_quantization
    quantization_override = method.override_quantization_method(
  File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/quantization/gptq_marlin.py", line 80, in override_quantization_method
    can_convert = cls.is_gptq_marlin_compatible(hf_quant_cfg)
  File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/quantization/gptq_marlin.py", line 125, in is_gptq_marlin_compatible
    return check_gptq_marlin_supported(
  File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/quantization/utils/marlin_utils.py", line 55, in check_gptq_marlin_supported
    cond, _ = _check_marlin_supported(num_bits,
  File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/layers/quantization/utils/marlin_utils.py", line 29, in _check_marlin_supported
    major, minor = current_platform.get_device_capability()
  File "/usr/local/lib/python3.8/dist-packages/vllm/platforms/interface.py", line 28, in get_device_capability
    raise NotImplementedError
NotImplementedError

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions