-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Closed
Labels
installationInstallation problemsInstallation problems
Description
Your current environment
(opensora) ubuntu@ubuntu:~/psh$ python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model /home/ubuntu/psh/UI-TARS-7B-DPO --limit-mm-per-prompt image=5 -tp 1 --trust-remote-code --port 8001
INFO 03-27 16:11:36 [__init__.py:239] Automatically detected platform cuda.
INFO 03-27 16:11:37 [api_server.py:981] vLLM API server version 0.8.2
INFO 03-27 16:11:37 [api_server.py:982] args: Namespace(host=None, port=8001, uvicorn_log_level='info', disable_uvicorn_access_log=False, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/home/ubuntu/psh/UI-TARS-7B-DPO', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=None, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt={'image': 5}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, use_tqdm_on_load=True, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_config=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['ui-tars'], qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, enable_reasoning=False, reasoning_parser=None, disable_cascade_attn=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False)
INFO 03-27 16:11:44 [config.py:585] This model supports multiple tasks: {'generate', 'score', 'embed', 'classify', 'reward'}. Defaulting to 'generate'.
INFO 03-27 16:11:44 [config.py:1697] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 03-27 16:11:46 [core.py:54] Initializing a V1 LLM engine (v0.8.2) with config: model='/home/ubuntu/psh/UI-TARS-7B-DPO', speculative_config=None, tokenizer='/home/ubuntu/psh/UI-TARS-7B-DPO', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=ui-tars, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":3,"custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":512}
WARNING 03-27 16:11:47 [utils.py:2321] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7f3d33297d90>
INFO 03-27 16:11:55 [parallel_state.py:954] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
INFO 03-27 16:11:55 [cuda.py:220] Using Flash Attention backend on V1 engine.
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.50, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
ERROR 03-27 16:11:56 [core.py:343] EngineCore hit an exception: Traceback (most recent call last):
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 335, in run_engine_core
ERROR 03-27 16:11:56 [core.py:343] engine_core = EngineCoreProc(*args, **kwargs)
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 290, in __init__
ERROR 03-27 16:11:56 [core.py:343] super().__init__(vllm_config, executor_class, log_stats)
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 60, in __init__
ERROR 03-27 16:11:56 [core.py:343] self.model_executor = executor_class(vllm_config)
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 52, in __init__
ERROR 03-27 16:11:56 [core.py:343] self._init_executor()
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 46, in _init_executor
ERROR 03-27 16:11:56 [core.py:343] self.collective_rpc("init_device")
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 03-27 16:11:56 [core.py:343] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/utils.py", line 2255, in run_method
ERROR 03-27 16:11:56 [core.py:343] return func(*args, **kwargs)
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 604, in init_device
ERROR 03-27 16:11:56 [core.py:343] self.worker.init_device() # type: ignore
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 120, in init_device
ERROR 03-27 16:11:56 [core.py:343] self.model_runner: GPUModelRunner = GPUModelRunner(
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 137, in __init__
ERROR 03-27 16:11:56 [core.py:343] encoder_compute_budget, encoder_cache_size = compute_encoder_budget(
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/v1/core/encoder_cache_manager.py", line 92, in compute_encoder_budget
ERROR 03-27 16:11:56 [core.py:343] ) = _compute_encoder_budget_multimodal(model_config, scheduler_config)
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/v1/core/encoder_cache_manager.py", line 115, in _compute_encoder_budget_multimodal
ERROR 03-27 16:11:56 [core.py:343] max_tokens_by_modality_dict = MULTIMODAL_REGISTRY.get_max_tokens_per_item_by_nonzero_modality( # noqa: E501
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/multimodal/registry.py", line 291, in get_max_tokens_per_item_by_nonzero_modality
ERROR 03-27 16:11:56 [core.py:343] self.get_max_tokens_per_item_by_modality(model_config).items()
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/multimodal/registry.py", line 265, in get_max_tokens_per_item_by_modality
ERROR 03-27 16:11:56 [core.py:343] return processor.info.get_mm_max_tokens_per_item(
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_vl.py", line 827, in get_mm_max_tokens_per_item
ERROR 03-27 16:11:56 [core.py:343] "image": self.get_max_image_tokens(),
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_vl.py", line 915, in get_max_image_tokens
ERROR 03-27 16:11:56 [core.py:343] target_width, target_height = self.get_image_size_with_most_features()
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_vl.py", line 907, in get_image_size_with_most_features
ERROR 03-27 16:11:56 [core.py:343] max_image_size, _ = self._get_vision_info(
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_vl.py", line 841, in _get_vision_info
ERROR 03-27 16:11:56 [core.py:343] image_processor = self.get_image_processor()
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_vl.py", line 810, in get_image_processor
ERROR 03-27 16:11:56 [core.py:343] return cached_image_processor_from_config(
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/transformers_utils/processor.py", line 157, in cached_image_processor_from_config
ERROR 03-27 16:11:56 [core.py:343] return cached_get_image_processor(
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/transformers_utils/processor.py", line 145, in get_image_processor
ERROR 03-27 16:11:56 [core.py:343] raise e
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/vllm/transformers_utils/processor.py", line 127, in get_image_processor
ERROR 03-27 16:11:56 [core.py:343] processor = AutoImageProcessor.from_pretrained(
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 557, in from_pretrained
ERROR 03-27 16:11:56 [core.py:343] return image_processor_class.from_dict(config_dict, **kwargs)
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/transformers/image_processing_base.py", line 423, in from_dict
ERROR 03-27 16:11:56 [core.py:343] image_processor = cls(**image_processor_dict)
ERROR 03-27 16:11:56 [core.py:343] File "/home/ubuntu/anaconda3/envs/opensora/lib/python3.10/site-packages/transformers/models/qwen2_vl/image_processing_qwen2_vl.py", line 144, in __init__
ERROR 03-27 16:11:56 [core.py:343] raise ValueError("size must contain 'shortest_edge' and 'longest_edge' keys.")
ERROR 03-27 16:11:56 [core.py:343] ValueError: size must contain 'shortest_edge' and 'longest_edge' keys.
ERROR 03-27 16:11:56 [core.py:343]
CRITICAL 03-27 16:11:56 [core_client.py:269] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
Killed
How you are installing vllm
pip install -vvv vllmBefore submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
installationInstallation problemsInstallation problems