Skip to content

[Ascend]GRPO:vllm+vllm-ascend 0.9.1+verl: "ValueError: infer_schema(func): Parameter block_size has unsupported type list[int]" #2564

@leo-pony

Description

@leo-pony

Version information:
verl:main
vllm:0.9.1
vllm-ascend: 0.9.1rc1/0.9.1-dev

Error information as following:

Traceback (most recent call last):
  File "/home/mnj/code/verl/verl/trainer/main_ppo.py", line 39, in main
    run_ppo(config)
  File "/home/mnj/code/verl/verl/trainer/main_ppo.py", line 73, in run_ppo
    ray.get(runner.run.remote(config))
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/worker.py", line 2822, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/ray/_private/worker.py", line 930, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TaskRunner.run() (pid=1175552, ip=172.17.0.22, actor_id=c32790b2ec94b968a3afaabc01000000, repr=<main_ppo.TaskRunner object at 0xfffbfc847310>)
  File "/home/mnj/code/verl/verl/trainer/main_ppo.py", line 143, in run
    from verl.utils.vllm_utils import is_version_ge
  File "/home/mnj/code/verl/verl/utils/vllm_utils.py", line 35, in <module>
    from vllm.model_executor.models.deepseek_v2 import DeepseekV2ForCausalLM, DeepseekV3ForCausalLM
  File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/models/deepseek_v2.py", line 38, in <module>
    from vllm.model_executor.layers.fused_moe import FusedMoE
  File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/layers/fused_moe/__init__.py", line 7, in <module>
    from vllm.model_executor.layers.fused_moe.layer import (
  File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/layers/fused_moe/layer.py", line 57, in <module>
    from vllm.model_executor.layers.fused_moe.fused_moe import grouped_topk
  File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/layers/fused_moe/fused_moe.py", line 15, in <module>
    from vllm.model_executor.layers.fused_moe.deep_gemm_moe import (
  File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/layers/fused_moe/deep_gemm_moe.py", line 11, in <module>
    from vllm.model_executor.layers.fused_moe.moe_permute_unpermute import (
  File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/layers/fused_moe/moe_permute_unpermute.py", line 10, in <module>
    from vllm.model_executor.layers.fused_moe.utils import _fp8_perm
  File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/layers/fused_moe/utils.py", line 9, in <module>
    from vllm.model_executor.layers.quantization.utils.fp8_utils import (
  File "/home/mnj/code/torch2_7/vllm/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 170, in <module>
    direct_register_custom_op(
  File "/home/mnj/code/torch2_7/vllm/vllm/utils.py", line 2221, in direct_register_custom_op
    schema_str = torch.library.infer_schema(op_func,
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_library/infer_schema.py", line 106, in infer_schema
    error_fn(
  File "/usr/local/python3.10.17/lib/python3.10/site-packages/torch/_library/infer_schema.py", line 58, in error_fn
    raise ValueError(
ValueError: infer_schema(func): Parameter block_size has unsupported type list[int]. The valid types are: dict_keys([<class 'torch.Tensor'>, typing.Optional[torch.Tensor], typing.Sequence[torch.Tensor], typing.List[torch.Tensor], typing.Sequence[typing.Optional[torch.Tensor]], typing.List[typing.Optional[torch.Tensor]], <class 'int'>, typing.Optional[int], typing.Sequence[int], typing.List[int], typing.Optional[typing.Sequence[int]], typing.Optional[typing.List[int]], <class 'float'>, typing.Optional[float], typing.Sequence[float], typing.List[float], typing.Optional[typing.Sequence[float]], typing.Optional[typing.List[float]], <class 'bool'>, typing.Optional[bool], typing.Sequence[bool], typing.List[bool], typing.Optional[typing.Sequence[bool]], typing.Optional[typing.List[bool]], <class 'str'>, typing.Optional[str], typing.Union[int, float, bool], typing.Union[int, float, bool, NoneType], typing.Sequence[typing.Union[int, float, bool]], typing.List[typing.Union[int, float, bool]], <class 'torch.dtype'>, typing.Optional[torch.dtype], <class 'torch.device'>, typing.Optional[torch.device]]). Got func with signature (input: torch.Tensor, weight: torch.Tensor, block_size: list[int], weight_scale: torch.Tensor, input_scale: Optional[torch.Tensor] = None, bias: Optional[torch.Tensor] = None, cutlass_block_fp8_supported: bool = False, use_aiter_and_is_supported: bool = False) -> torch.Tensor)

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Root cause:
vllm 0.9.1 uses PyTorch 2.7.0, PyTorch 2.7.0 works with Python builtin types (like list[int]).
vllm-ascend plugin 0.9.1 uses PyTorch 2.5.1, PyTorch 2.5.1 requires typing library annotations (like List[int]).
This issue is correctly handled in vllm-ascend plugin, but as verl import vllm's model before vllm-ascend plugin takes effect , so error take place. Detail import position:

  File "/home/mnj/code/verl/verl/trainer/main_ppo.py", line 143, in run
    from verl.utils.vllm_utils import is_version_ge
  File "/home/mnj/code/verl/verl/utils/vllm_utils.py", line 35, in <module>
    from vllm.model_executor.models.deepseek_v2 import DeepseekV2ForCausalLM, DeepseekV3ForCausalLM

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions