[Usage]: Mutiple-GPU usage with Fastapi and uvicorn not working

I am facing an error when attempting to utilize multiple GPUs with a FastAPI backend. The error arises during the integration of the multiple GPU code into the FastAPI backend API. Interestingly, the same code functions correctly when executed independently for multiple GPUs. I am simply loading model after loading libraries.

`
LLM(model=model, max_model_len=16000, tensor_parallel_size=4)
`


TypeError: cannot pickle '_thread.lock' object

Error Traceback

```
    self.llm = LLM(model=model, max_model_len=16000, tensor_parallel_size=4)
  File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/llm.py", line 109, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 391, in from_engine_args
    engine = cls(*engine_configs,
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 126, in __init__
    self._init_workers_ray(placement_group)
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 304, in _init_workers_ray
    self._run_workers("init_model",
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 1041, in _run_workers
    driver_worker_output = getattr(self.driver_worker,
  File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 94, in init_model
    init_distributed_environment(self.parallel_config, self.rank,
  File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 275, in init_distributed_environment
    cupy_utils.init_process_group(
  File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/parallel_utils/cupy_utils.py", line 90, in init_process_group
    _NCCL_BACKEND = NCCLBackendWithBFloat16(world_size, rank, host, port)
  File "/usr/local/lib/python3.8/dist-packages/cupyx/distributed/_nccl_comm.py", line 70, in __init__
    self._init_with_tcp_store(n_devices, rank, host, port)
  File "/usr/local/lib/python3.8/dist-packages/cupyx/distributed/_nccl_comm.py", line 88, in _init_with_tcp_store
    self._store.run(host, port)
  File "/usr/local/lib/python3.8/dist-packages/cupyx/distributed/_store.py", line 100, in run
    p.start()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_thread.lock' object
```

### How would you like to use vllm

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: Mutiple-GPU usage with Fastapi and uvicorn not working #3612

How would you like to use vllm

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: Mutiple-GPU usage with Fastapi and uvicorn not working #3612

Description

How would you like to use vllm

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions