Skip to content

[Bug]: ValueError: bytes must be in range(0, 256) #3617

@cosmic-heart

Description

@cosmic-heart

Your current environment

root@3b4826375ab0:/workspace# python collect_env.py
Collecting environment information...
PyTorch version: 2.1.2
Is debug build: False
CUDA used to build PyTorch: 12.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (ppc64le)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.28.4
Libc version: glibc-2.35

Python version: 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 16:04:32) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-100-generic-ppc64le-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 12.2.91
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB

Nvidia driver version: 535.161.07
cuDNN version: Probably one of the following:
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn.so.8
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_adv_infer.so.8
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_adv_train.so.8
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_cnn_infer.so.8
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_cnn_train.so.8
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_ops_infer.so.8
/usr/local/cuda-12.2/targets/ppc64le-linux/lib/libcudnn_ops_train.so.8
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: False

CPU:
Architecture:                       ppc64le
Byte Order:                         Little Endian
CPU(s):                             128
On-line CPU(s) list:                0-127
Model name:                         POWER9, altivec supported
Model:                              2.2 (pvr 004e 1202)
Thread(s) per core:                 4
Core(s) per socket:                 16
Socket(s):                          2
Frequency boost:                    enabled
CPU max MHz:                        3800.0000
CPU min MHz:                        2300.0000
L1d cache:                          1 MiB (32 instances)
L1i cache:                          1 MiB (32 instances)
L2 cache:                           8 MiB (16 instances)
L3 cache:                           160 MiB (16 instances)
NUMA node(s):                       6
NUMA node0 CPU(s):                  0-63
NUMA node8 CPU(s):                  64-127
NUMA node252 CPU(s):                
NUMA node253 CPU(s):                
NUMA node254 CPU(s):                
NUMA node255 CPU(s):                
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Mitigation; RFI Flush, L1D private per thread
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Mitigation; RFI Flush, L1D private per thread
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Kernel entry/exit barrier (eieio)
Vulnerability Spectre v1:           Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
Vulnerability Spectre v2:           Mitigation; Indirect branch serialisation (kernel only)
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.1.2
[pip3] triton==2.1.0
[conda] cudatoolkit               11.8.0              hedcfb66_13    conda-forge
[conda] libmagma                  2.7.2                he288b6c_2    conda-forge
[conda] libmagma_sparse           2.7.2                h5b5c57a_3    conda-forge
[conda] magma                     2.7.2                h097a1ca_3    conda-forge
[conda] numpy                     1.24.3          py310h87cc683_0  
[conda] numpy-base                1.24.3          py310hac71eb6_0  
[conda] torch                     2.1.2                    pypi_0    pypiROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.3.3
vLLM Build Flags:
CUDA Archs: 7.0; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV3     SYS     SYS     0-63    0               N/A
GPU1    NV3      X      SYS     SYS     0-63    0               N/A
GPU2    SYS     SYS      X      NV3     64-127  8               N/A
GPU3    SYS     SYS     NV3      X      64-127  8               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

from vllm import LLM, SamplingParams
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(
    model="./models", 
    dtype="float16", 
    tensor_parallel_size=4, 
    enforce_eager=False, 
    trust_remote_code=True, 
    load_format='safetensors'
)
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

if enforce_eager=False then i am getting the following error, but enforce_eager=True i am not getting any error as of this stage.

root@3b4826375ab0:/workspace# python3 example.py 
WARNING 03-25 16:50:03 config.py:686] Casting torch.bfloat16 to torch.float16.
2024-03-25 16:50:06,721 INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
INFO 03-25 16:50:09 llm_engine.py:68] Initializing an LLM engine (v0.3.3) with config: model='./models', tokenizer='./models', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=safetensors, tensor_parallel_size=4, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
Traceback (most recent call last):
  File "/workspace/example.py", line 10, in <module>
    llm = LLM(
  File "/root/vllm/vllm/entrypoints/llm.py", line 109, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/root/vllm/vllm/engine/llm_engine.py", line 146, in from_engine_args
    engine = cls(*engine_configs,
  File "/root/vllm/vllm/engine/llm_engine.py", line 103, in __init__
    self.model_executor = executor_class(model_config, cache_config,
  File "/root/vllm/vllm/executor/ray_gpu_executor.py", line 60, in __init__
    self._init_workers_ray(placement_group)
  File "/root/vllm/vllm/executor/ray_gpu_executor.py", line 190, in _init_workers_ray
    self._run_workers("init_device",
  File "/root/vllm/vllm/executor/ray_gpu_executor.py", line 318, in _run_workers
    driver_worker_output = getattr(self.driver_worker,
  File "/root/vllm/vllm/worker/worker.py", line 92, in init_device
    init_distributed_environment(self.parallel_config, self.rank,
  File "/root/vllm/vllm/worker/worker.py", line 276, in init_distributed_environment
    cupy_utils.init_process_group(
  File "/root/vllm/vllm/model_executor/parallel_utils/cupy_utils.py", line 90, in init_process_group
    _NCCL_BACKEND = NCCLBackendWithBFloat16(world_size, rank, host, port)
  File "/root/miniconda3/lib/python3.10/site-packages/cupyx/distributed/_nccl_comm.py", line 70, in __init__
    self._init_with_tcp_store(n_devices, rank, host, port)
  File "/root/miniconda3/lib/python3.10/site-packages/cupyx/distributed/_nccl_comm.py", line 93, in _init_with_tcp_store
    shifted_nccl_id = bytes([b + 128 for b in nccl_id])
ValueError: bytes must be in range(0, 256)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions