-
Notifications
You must be signed in to change notification settings - Fork 769
Description
System Info / 系統信息
unbuntu python3.11 依赖版本是跟随v1.6.0.post1 安装的
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- docker / docker
- pip install / 通过 pip install 安装
- installation from source / 从源码安装
Version info / 版本信息
v1.6.0.post1
The command used to start Xinference / 用以启动 xinference 的命令
xinference launch --model-name qwen2-vl-instruct --model-type LLM --model-engine vLLM --model-format gptq --size-in-billions 72 --quantization Int4 --n-gpu 2 --replica 1 --n-worker 1
Reproduction / 复现过程
1.启动模型Qwen2-VL-72B
2.等待下载完成后加载模型
3.报错
ValueError: [address=0.0.0.0:42019, pid=70209] size must contain 'shortest_edge' and 'longest_edge' keys.
2025-05-20 12:42:41,014 xinference.core.worker 69476 INFO [request dcf59802-3534-11f0-82f6-0242ac110005] Enter terminate_model, args: <xinference.core.worker.WorkerActor object at 0x7fd403fb7d70>, kwargs: model_uid=qwen2-vl-instruct-0
2025-05-20 12:42:41,018 xinference.model.llm.vllm.core 70209 INFO Stopping vLLM engine
/usr/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
2025-05-20 12:42:41,638 xinference.core.worker 69476 INFO [request dcf59802-3534-11f0-82f6-0242ac110005] Leave terminate_model, elapsed time: 0 s
2025-05-20 12:42:41,655 xinference.api.restful_api 69341 ERROR [address=0.0.0.0:42019, pid=70209] size must contain 'shortest_edge' and 'longest_edge' keys.
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/xinference/api/restful_api.py", line 1054, in launch_model
model_uid = await (await self._get_supervisor_ref()).launch_builtin_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xoscar/backends/context.py", line 262, in send
return self._process_result_message(result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message
raise message.as_instanceof_cause()
File "/usr/local/lib/python3.11/dist-packages/xoscar/backends/pool.py", line 689, in send
result = await self._run_coro(message.message_id, coro)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro
return await coro
File "/usr/local/lib/python3.11/dist-packages/xoscar/api.py", line 418, in on_receive
return await super().on_receive(message) # type: ignore
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 564, in on_receive
raise ex
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
async with self._lock:
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive
result = await result
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xinference/core/supervisor.py", line 1199, in launch_builtin_model
await _launch_model()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xinference/core/supervisor.py", line 1134, in _launch_model
subpool_address = await _launch_one_model(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xinference/core/supervisor.py", line 1106, in _launch_one_model
await worker_ref.wait_for_load(_replica_model_uid)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xoscar/backends/context.py", line 262, in send
return self._process_result_message(result)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message
raise message.as_instanceof_cause()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xoscar/backends/pool.py", line 689, in send
result = await self._run_coro(message.message_id, coro)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro
return await coro
File "/usr/local/lib/python3.11/dist-packages/xoscar/api.py", line 418, in on_receive
return await super().on_receive(message) # type: ignore
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 564, in on_receive
raise ex
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
async with self._lock:
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive
result = await result
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xinference/core/utils.py", line 93, in wrapped
ret = await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xinference/core/worker.py", line 1178, in wait_for_load
await model_ref.wait_for_load()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xoscar/backends/context.py", line 262, in send
return self._process_result_message(result)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xoscar/backends/context.py", line 111, in _process_result_message
raise message.as_instanceof_cause()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xoscar/backends/pool.py", line 689, in send
result = await self._run_coro(message.message_id, coro)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xoscar/backends/pool.py", line 389, in _run_coro
return await coro
File "/usr/local/lib/python3.11/dist-packages/xoscar/api.py", line 418, in on_receive
return await super().on_receive(message) # type: ignore
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 564, in on_receive
raise ex
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
async with self._lock:
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 527, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 532, in xoscar.core._BaseActor.on_receive
result = await result
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xinference/core/model.py", line 498, in wait_for_load
await asyncio.to_thread(self._model.wait_for_load)
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xinference/model/llm/vllm/core.py", line 503, in wait_for_load
raise err.with_traceback(tb)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xinference/model/llm/vllm/core.py", line 472, in _load
self._engine = XinferenceAsyncLLMEngine.from_engine_args(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/engine/async_llm_engine.py", line 684, in from_engine_args
return async_engine_cls.from_vllm_config(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/engine/async_llm_engine.py", line 657, in from_vllm_config
return cls(
File "/usr/local/lib/python3.11/dist-packages/vllm/engine/async_llm_engine.py", line 612, in init
self.engine = self._engine_class(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/engine/async_llm_engine.py", line 267, in init
super().init(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/engine/llm_engine.py", line 278, in init
self._initialize_kv_caches()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/engine/llm_engine.py", line 422, in _initialize_kv_caches
self.model_executor.determine_num_available_blocks())
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/executor/executor_base.py", line 103, in determine_num_available_blocks
results = self.collective_rpc("determine_num_available_blocks")
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/executor/executor_base.py", line 331, in collective_rpc
return self._run_workers(method, *args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xinference/model/llm/vllm/distributed_executor.py", line 236, in _run_workers
self.driver_worker.execute_method(method, *args, **kwargs) # type: ignore
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/xinference/model/llm/vllm/distributed_executor.py", line 66, in execute_method
return getattr(self._worker, method)(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/worker/worker.py", line 249, in determine_num_available_blocks
self.model_runner.profile_run()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/worker/model_runner.py", line 1237, in profile_run
self._dummy_run(max_num_batched_tokens, max_num_seqs)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/worker/model_runner.py", line 1296, in _dummy_run
max_mm_tokens = self.mm_registry.get_max_multimodal_tokens(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/multimodal/registry.py", line 170, in get_max_multimodal_tokens
return sum(self.get_max_tokens_by_modality(model_config).values())
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/multimodal/registry.py", line 160, in get_max_tokens_by_modality
self.get_max_tokens_per_item_by_modality(model_config).items()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/multimodal/registry.py", line 115, in get_max_tokens_per_item_by_modality
return profiler.get_mm_max_tokens(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/multimodal/profiling.py", line 272, in get_mm_max_tokens
mm_inputs = self._get_dummy_mm_inputs(seq_len, mm_counts)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/multimodal/profiling.py", line 176, in _get_dummy_mm_inputs
processor_inputs = factory.get_dummy_processor_inputs(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/multimodal/profiling.py", line 101, in get_dummy_processor_inputs
dummy_text = self.get_dummy_text(mm_counts)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 971, in get_dummy_text
hf_processor = self.info.get_hf_processor()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 763, in get_hf_processor
image_processor=self.get_image_processor(min_pixels=min_pixels,
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 809, in get_image_processor
return cached_image_processor_from_config(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/transformers_utils/processor.py", line 206, in cached_image_processor_from_config
return cached_get_image_processor(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm/transformers_utils/processor.py", line 194, in get_image_processor
raise e
File "/usr/local/lib/python3.11/dist-packages/vllm/transformers_utils/processor.py", line 176, in get_image_processor
processor = AutoImageProcessor.from_pretrained(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/transformers/models/auto/image_processing_auto.py", line 559, in from_pretrained
return image_processor_class.from_dict(config_dict, **kwargs)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/transformers/image_processing_base.py", line 422, in from_dict
image_processor = cls(**image_processor_dict)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/transformers/models/qwen2_vl/image_processing_qwen2_vl.py", line 144, in init
raise ValueError("size must contain 'shortest_edge' and 'longest_edge' keys.")
^^^^^^^^^^^^^^^^^
ValueError: [address=0.0.0.0:42019, pid=70209] size must contain 'shortest_edge' and 'longest_edge' keys.
说是模型配置没有这两个参数,但是我是从xinference 里面默认下载的
Expected behavior / 期待表现
如何解决这个问题?