Skip to content

[Usage]: Mutiple-GPU usage with Fastapi and uvicorn not working #3612

@humza-sami

Description

@humza-sami

I am facing an error when attempting to utilize multiple GPUs with a FastAPI backend. The error arises during the integration of the multiple GPU code into the FastAPI backend API. Interestingly, the same code functions correctly when executed independently for multiple GPUs. I am simply loading model after loading libraries.

LLM(model=model, max_model_len=16000, tensor_parallel_size=4)

TypeError: cannot pickle '_thread.lock' object

Error Traceback

    self.llm = LLM(model=model, max_model_len=16000, tensor_parallel_size=4)
  File "/usr/local/lib/python3.8/dist-packages/vllm/entrypoints/llm.py", line 109, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 391, in from_engine_args
    engine = cls(*engine_configs,
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 126, in __init__
    self._init_workers_ray(placement_group)
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 304, in _init_workers_ray
    self._run_workers("init_model",
  File "/usr/local/lib/python3.8/dist-packages/vllm/engine/llm_engine.py", line 1041, in _run_workers
    driver_worker_output = getattr(self.driver_worker,
  File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 94, in init_model
    init_distributed_environment(self.parallel_config, self.rank,
  File "/usr/local/lib/python3.8/dist-packages/vllm/worker/worker.py", line 275, in init_distributed_environment
    cupy_utils.init_process_group(
  File "/usr/local/lib/python3.8/dist-packages/vllm/model_executor/parallel_utils/cupy_utils.py", line 90, in init_process_group
    _NCCL_BACKEND = NCCLBackendWithBFloat16(world_size, rank, host, port)
  File "/usr/local/lib/python3.8/dist-packages/cupyx/distributed/_nccl_comm.py", line 70, in __init__
    self._init_with_tcp_store(n_devices, rank, host, port)
  File "/usr/local/lib/python3.8/dist-packages/cupyx/distributed/_nccl_comm.py", line 88, in _init_with_tcp_store
    self._store.run(host, port)
  File "/usr/local/lib/python3.8/dist-packages/cupyx/distributed/_store.py", line 100, in run
    p.start()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_thread.lock' object

How would you like to use vllm

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    usageHow to use vllm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions