-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Closed
Labels
usageHow to use vllmHow to use vllm
Description
I am facing an error when attempting to utilize multiple GPUs with a FastAPI backend. The error arises during the integration of the multiple GPU code into the FastAPI backend API. Interestingly, the same code functions correctly when executed independently for multiple GPUs. I am simply loading model after loading libraries.
LLM(model=model, max_model_len=16000, tensor_parallel_size=4)
TypeError: cannot pickle '_thread.lock' object
Error Traceback
self.llm = LLM(model=model, max_model_len=16000, tensor_parallel_size=4)
File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/llm.py", line 109, in __init__
self.llm_engine = LLMEngine.from_engine_args(engine_args)
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 391, in from_engine_args
engine = cls(*engine_configs,
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 126, in __init__
self._init_workers_ray(placement_group)
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 304, in _init_workers_ray
self._run_workers("init_model",
File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 1041, in _run_workers
driver_worker_output = getattr(self.driver_worker,
File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 94, in init_model
init_distributed_environment(self.parallel_config, self.rank,
File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 275, in init_distributed_environment
cupy_utils.init_process_group(
File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/parallel_utils/cupy_utils.py", line 90, in init_process_group
_NCCL_BACKEND = NCCLBackendWithBFloat16(world_size, rank, host, port)
File "/usr/local/lib/python3.8/dist-packages/cupyx/distributed/_nccl_comm.py", line 70, in __init__
self._init_with_tcp_store(n_devices, rank, host, port)
File "/usr/local/lib/python3.8/dist-packages/cupyx/distributed/_nccl_comm.py", line 88, in _init_with_tcp_store
self._store.run(host, port)
File "/usr/local/lib/python3.8/dist-packages/cupyx/distributed/_store.py", line 100, in run
p.start()
File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_thread.lock' object
How would you like to use vllm
No response
Metadata
Metadata
Assignees
Labels
usageHow to use vllmHow to use vllm