好的,这是一个典型的在并行环境(例如使用 Ray)中运行多个 vLLM 或 torch.compile 工作进程(worker)时发生的错误。
核心问题是“竞态条件”(Race Condition):多个进程在同一时刻试图创建、写入和删除同一个共享缓存目录中的文件,导致了冲突。
错误分解
你的错误日志显示了一个清晰的连锁反应:
根本原因 (OSError 和 FileNotFoundError):
OSError: [Errno 16] Device or resource busy (设备或资源正忙): 这是最关键的一条信息。当 torch.compile 使用 Triton 后端时,它会在 ~/.cache/vllm/torch_compile_cache/... 路径下为编译好的 GPU 内核创建一个缓存。你的多个 ActorWorker 进程(如 actor_infer-3, actor_infer-7)在启动时,都尝试编译相同的模型。因此,它们会同时去写入和修改这一个共享的缓存目录。当一个进程锁定了某个文件或目录时,其他进程再去访问就会失败,并报出“资源正忙”的错误。
FileNotFoundError: [Errno 2] No such file or directory (文件或目录不存在): 这是同一种竞态条件导致的并发症状。可能一个进程正在尝试读取某个临时文件,而另一个进程已经完成了自己的编译/缓存步骤并将其删除了。
直接后果 (BackendCompilerFailed):
由于上述的 OSError,Triton 内核的编译失败了。
这个失败被 PyTorch 的 Dynamo 捕获,然后抛出一个 torch._dynamo.exc.BackendCompilerFailed 异常。这表示后端(在这里是 vllm)执行编译任务失败。
最终崩溃 (ActorDiedError):
这个未被处理的 BackendCompilerFailed 异常导致了 ActorWorker 进程的崩溃。
Ray 作为管理这些工作进程的分布式框架,检测到它的子进程(Actor)意外死亡,于是向主程序抛出 ray.exceptions.ActorDiedError,最终导致整个脚本停止。
Ray 日志中提到的 OOM killer (内存不足) 是一个通用性的猜测。虽然内存问题也可能导致进程死亡,但从你的完整错误日志来看,OSError: Device or resource busy 是更直接、更明确的根本原因。
defaults:
- ../config/envs@_here_
- ../config/deepspeed_zero@_here_
- ../config/deepspeed_zero2@_here_
- ../config/deepspeed_zero3@_here_
- ../config/deepspeed_zero3_cpuoffload@_here_
hydra:
run:
dir: .
output_subdir: null
exp_name: "agentic_pipeline"
seed: 42
logging_dir: ./output/logs
output_dir: ./output
render_save_dir: ./output/render
system_envs:
USE_MODELSCOPE: '1'
#track_with: wandb
#tracker_kwargs:
# api_key:
# project: roll-agentic
# name: ${exp_name}_sokoban
# notes: "agentic_pipeline"
# tags:
# - agentic
# - roll
# - baseline
track_with: tensorboard
tracker_kwargs:
log_dir: /data/home/zdhs0086/ROLL/log/qwen2.5-0.5B-Instruct/agentic_frozen_lake
checkpoint_config:
type: file_system
output_dir: /data/home/zdhs0086/ROLL/rl_examples/models/${exp_name}
num_gpus_per_node: 8
max_steps: 1024
save_steps: 10000
logging_steps: 1
eval_steps: 10
resume_from_checkpoint: false
rollout_batch_size: 1024
val_batch_size: 1024
sequence_length: 8192
advantage_clip: 0.2
ppo_epochs: 1
adv_estimator: "grpo"
#pg_clip: 0.1
#dual_clip_loss: True
init_kl_coef: 0.0
whiten_advantages: true
entropy_loss_coef: 0
max_grad_norm: 1.0
pretrain: /data/home/zdhs0086/zdhs0086_src_data/Qwen2.5-0.5B-Instruct
reward_pretrain: /data/home/zdhs0086/zdhs0086_src_data/Qwen2.5-0.5B-Instruct
actor_train:
model_args:
attn_implementation: fa2
disable_gradient_checkpointing: false
dtype: bf16
model_type: ~
training_args:
learning_rate: 1.0e-6
weight_decay: 0
per_device_train_batch_size: 2
gradient_accumulation_steps: 64
warmup_steps: 10
lr_scheduler_type: cosine
data_args:
template: qwen2_5
strategy_args:
# strategy_name: deepspeed_train
# strategy_config: ${deepspeed_zero3}
strategy_name: megatron_train
strategy_config:
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
expert_model_parallel_size: 1
use_distributed_optimizer: true
recompute_granularity: full
device_mapping: list(range(0,8))
infer_batch_size: 2
actor_infer:
model_args:
disable_gradient_checkpointing: true
dtype: bf16
generating_args:
max_new_tokens: 128 # single-turn response length
top_p: 0.99
top_k: 100
num_beams: 1
temperature: 0.99
num_return_sequences: 1
data_args:
template: qwen2_5
strategy_args:
strategy_name: vllm
strategy_config:
gpu_memory_utilization: 0.8
block_size: 16
load_format: auto
device_mapping: list(range(0,8))
reference:
model_args:
attn_implementation: fa2
disable_gradient_checkpointing: true
dtype: bf16
model_type: ~
data_args:
template: qwen2_5
strategy_args:
strategy_name: hf_infer
strategy_config: ~
device_mapping: list(range(0,8))
infer_batch_size: 2
action_pattern: <answer>(.*?)</answer>
think_action_pattern: <think>(.*?)</think>\s*<answer>(.*?)</answer>
user_prompt_no_think_format: <answer> [your answer] </answer>
user_prompt_think_format: <think> [Your thoughts] </think> <answer> [your answer] </answer>
added_text_no_think: <answer>
added_text_think: <think>
max_tokens_per_step: 128
max_actions_per_traj: 10
reward_normalization:
grouping: traj_group_id # 可以tags(env_type)/traj_group_id(group)/batch(rollout_batch)... group_by计算reward/adv
method: mean_std # asym_clip / identity / mean_std
train_env_manager:
format_penalty: -0.15 # sokoban env penalty_for_step=-0.1
max_env_num_per_worker: 16
num_env_groups: 128
# under the same group, the env config and env seed are ensured to be equal
group_size: 8
tags: [FrozenLake]
num_groups_partition: [128] # If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation
val_env_manager:
max_env_num_per_worker: 32
num_env_groups: 1024
group_size: 1 # should be set to 1 because val temperature is set to 0 and same prompt leads to same output
tags: [SimpleSokoban, LargerSokoban, SokobanDifferentGridVocab, FrozenLake]
num_groups_partition: [256, 256, 256, 256] # TODO: If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation
custom_envs:
SimpleSokoban:
env_type: sokoban
max_tokens_per_step: ${max_tokens_per_step}
user_prompt_format: ${user_prompt_no_think_format}
added_text: ${added_text_no_think}
env_manager_cls: roll.pipeline.agentic.env_manager.traj_env_manager.TrajEnvManager
use_thread_lock: true
env_config: # keys should be a subset of SokobanConfig
env_instruction: "You are solving the Sokoban puzzle. You are the player and you need to push all boxes to targets. When you are right next to a box, you can push it by moving in the same direction. You cannot push a box through a wall, and you cannot pull a box. The answer must be one of action in a turn, format is <answer>Right</answer>"
action_pattern: ${action_pattern}
max_steps: ${max_actions_per_traj}
dim_x: 6
dim_y: 6
num_boxes: 1
LargerSokoban:
env_type: sokoban
max_tokens_per_step: ${max_tokens_per_step}
user_prompt_format: ${user_prompt_no_think_format}
added_text: ${added_text_no_think}
env_manager_cls: roll.pipeline.agentic.env_manager.traj_env_manager.TrajEnvManager
use_thread_lock: true
env_config:
env_instruction: "You are solving the Sokoban puzzle. You are the player and you need to push all boxes to targets. When you are right next to a box, you can push it by moving in the same direction. You cannot push a box through a wall, and you cannot pull a box. The answer must be one of action in a turn, format is <answer>Right</answer>"
action_pattern: ${action_pattern}
max_steps: ${max_actions_per_traj}
dim_x: 8
dim_y: 8
num_boxes: 2
search_depth: 10
SokobanDifferentGridVocab:
env_type: sokoban
max_tokens_per_step: ${max_tokens_per_step}
user_prompt_format: ${user_prompt_no_think_format}
added_text: ${added_text_no_think}
env_manager_cls: roll.pipeline.agentic.env_manager.traj_env_manager.TrajEnvManager
use_thread_lock: true
env_config: # keys should be a subset of SokobanConfig
env_instruction: "You are solving the Sokoban puzzle. You are the player and you need to push all boxes to targets. When you are right next to a box, you can push it by moving in the same direction. You cannot push a box through a wall, and you cannot pull a box. The answer must be one of action in a turn, format is <answer>Right</answer>"
action_pattern: ${action_pattern}
max_steps: ${max_actions_per_traj}
search_depth: 30
dim_x: 6
dim_y: 6
num_boxes: 1
grid_lookup: { 0: "W", 1: ".", 2: "G", 3: "C", 4: "B", 5: "A", 6: "@" }
grid_vocab: { "W": "wall", ".": "empty", "G": "target", "C": "box on target", "B": "box", "A": "player", "@": "player on target" }
FrozenLake:
env_type: frozen_lake
max_tokens_per_step: ${max_tokens_per_step}
user_prompt_format: ${user_prompt_no_think_format}
added_text: ${added_text_no_think}
env_manager_cls: roll.pipeline.agentic.env_manager.traj_env_manager.TrajEnvManager
use_thread_lock: true
env_config:
env_instruction: "You are solving the FrozenLake puzzle. Forbid the whole and go to the target. You may move to the unintended direction due to the slippery ice. The answer must be one of action in a turn, format is <answer>Right</answer>"
action_pattern: ${action_pattern}
max_steps: ${max_actions_per_traj}
is_slippery: false
FrozenLakeThink:
env_type: frozen_lake
max_tokens_per_step: ${max_tokens_per_step}
user_prompt_format: ${user_prompt_think_format}
added_text: ${added_text_think}
env_manager_cls: roll.pipeline.agentic.env_manager.traj_env_manager.TrajEnvManager
use_thread_lock: true
env_config:
env_instruction: "You are solving the FrozenLake puzzle. Forbid the whole and go to the target. You may move to the unintended direction due to the slippery ice. The answer must be one of action in a turn, format is <answer>Right</answer>"
action_pattern: ${think_action_pattern}
max_steps: ${max_actions_per_traj}
is_slippery: false
Ask Gemini2.5 Pro:
My env:
8*H100
success to
pip install -r requirements_torch260_vllm.txtMy config: