-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Closed
Description
Version information:
verl:main
vllm:0.9.1
vllm-ascend: 0.9.1rc1/0.9.1-dev
Error information as following:
Traceback (most recent call last):
File "/home/mnj/code/verl/verl/trainer/main_ppo.py", line 39, in main
run_ppo(config)
File "/home/mnj/code/verl/verl/trainer/main_ppo.py", line 73, in run_ppo
ray.get(runner.run.remote(config))
File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/worker.py", line 2822, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/worker.py", line 930, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TaskRunner.run() (pid=1175552, ip=172.17.0.22, actor_id=c32790b2ec94b968a3afaabc01000000, repr=<main_ppo.TaskRunner object at 0xfffbfc847310>)
File "/home/mnj/code/verl/verl/trainer/main_ppo.py", line 143, in run
from verl.utils.vllm_utils import is_version_ge
File "/home/mnj/code/verl/verl/utils/vllm_utils.py", line 35, in <module>
from vllm.model_executor.models.deepseek_v2 import DeepseekV2ForCausalLM, DeepseekV3ForCausalLM
File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/models/deepseek_v2.py", line 38, in <module>
from vllm.model_executor.layers.fused_moe import FusedMoE
File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/layers/fused_moe/__init__.py", line 7, in <module>
from vllm.model_executor.layers.fused_moe.layer import (
File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/layers/fused_moe/layer.py", line 57, in <module>
from vllm.model_executor.layers.fused_moe.fused_moe import grouped_topk
File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/layers/fused_moe/fused_moe.py", line 15, in <module>
from vllm.model_executor.layers.fused_moe.deep_gemm_moe import (
File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/layers/fused_moe/deep_gemm_moe.py", line 11, in <module>
from vllm.model_executor.layers.fused_moe.moe_permute_unpermute import (
File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/layers/fused_moe/moe_permute_unpermute.py", line 10, in <module>
from vllm.model_executor.layers.fused_moe.utils import _fp8_perm
File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/layers/fused_moe/utils.py", line 9, in <module>
from vllm.model_executor.layers.quantization.utils.fp8_utils import (
File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 170, in <module>
direct_register_custom_op(
File "/home/mnj/code/torch2_7/vllm/vllm/utils.py", line 2221, in direct_register_custom_op
schema_str = torch.library.infer_schema(op_func,
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_library/infer_schema.py", line 106, in infer_schema
error_fn(
File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_library/infer_schema.py", line 58, in error_fn
raise ValueError(
ValueError: infer_schema(func): Parameter block_size has unsupported type list[int]. The valid types are: dict_keys([<class 'torch.Tensor'>, typing.Optional[torch.Tensor], typing.Sequence[torch.Tensor], typing.List[torch.Tensor], typing.Sequence[typing.Optional[torch.Tensor]], typing.List[typing.Optional[torch.Tensor]], <class 'int'>, typing.Optional[int], typing.Sequence[int], typing.List[int], typing.Optional[typing.Sequence[int]], typing.Optional[typing.List[int]], <class 'float'>, typing.Optional[float], typing.Sequence[float], typing.List[float], typing.Optional[typing.Sequence[float]], typing.Optional[typing.List[float]], <class 'bool'>, typing.Optional[bool], typing.Sequence[bool], typing.List[bool], typing.Optional[typing.Sequence[bool]], typing.Optional[typing.List[bool]], <class 'str'>, typing.Optional[str], typing.Union[int, float, bool], typing.Union[int, float, bool, NoneType], typing.Sequence[typing.Union[int, float, bool]], typing.List[typing.Union[int, float, bool]], <class 'torch.dtype'>, typing.Optional[torch.dtype], <class 'torch.device'>, typing.Optional[torch.device]]). Got func with signature (input: torch.Tensor, weight: torch.Tensor, block_size: list[int], weight_scale: torch.Tensor, input_scale: Optional[torch.Tensor] = None, bias: Optional[torch.Tensor] = None, cutlass_block_fp8_supported: bool = False, use_aiter_and_is_supported: bool = False) -> torch.Tensor)
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Root cause:
vllm 0.9.1 uses PyTorch 2.7.0, PyTorch 2.7.0 works with Python builtin types (like list[int]).
vllm-ascend plugin 0.9.1 uses PyTorch 2.5.1, PyTorch 2.5.1 requires typing library annotations (like List[int]).
This issue is correctly handled in vllm-ascend plugin, but as verl import vllm's model before vllm-ascend plugin takes effect , so error take place. Detail import position:
File "/home/mnj/code/verl/verl/trainer/main_ppo.py", line 143, in run
from verl.utils.vllm_utils import is_version_ge
File "/home/mnj/code/verl/verl/utils/vllm_utils.py", line 35, in <module>
from vllm.model_executor.models.deepseek_v2 import DeepseekV2ForCausalLM, DeepseekV3ForCausalLM
Metadata
Metadata
Assignees
Labels
No labels