-
Notifications
You must be signed in to change notification settings - Fork 373
Open
Description
The Error traceback:
File "/home/tiger/.pyenv/versions/3.11.2/lib/python3.11/site-packages/agentlightning/verl/entrypoint.py", line 152, in run
trainer.fit()
File "/home/tiger/.pyenv/versions/3.11.2/lib/python3.11/site-packages/agentlightning/verl/trainer.py", line 318, in fit
metrics = self._train_step(batch_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tiger/.pyenv/versions/3.11.2/lib/python3.11/site-packages/agentlightning/verl/trainer.py", line 95, in _train_step
batch, agent_metrics = self.agent_mode_daemon.get_train_data_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tiger/.pyenv/versions/3.11.2/lib/python3.11/site-packages/agentlightning/verl/daemon.py", line 379, in get_train_data_batch
original_sample = self._task_id_to_original_sample[rollout_id]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
training log
(Process-11615 agentlightning.server) Requeuing task rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284 after timeout (attempt 1)
(Process-11615 agentlightning.server) Task rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284 timed out after 600.0s, requeued (attempt 1)
(Process-11615 agentlightning.server) Task rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284 re-claimed (attempt 2)
(Process-11615 agentlightning.server) Rollout received and stored: rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284
agent log
[Task 10133 Received] ID: rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284
[Task 10190 Received] ID: rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) [Rollout rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284] Message length details:
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) Message 0: 2633 characters
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) Message 1: 3002 characters
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) Message 2: 176 characters
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) Message 3: 3013 characters
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) Message 4: 323 characters
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) Message 5: 4113 characters
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) Total: 6 messages, 13260 characters
(Process-1116 agentlightning.runner) [Worker 3 | Rollout rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284] Completed in 25.88s. Triplet length: 4. Reward: 0.0
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) [Rollout rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284] Message length details:
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) Message 0: 2633 characters
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) Message 1: 4985 characters
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) Message 2: 265 characters
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) Message 3: 3013 characters
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) Message 4: 412 characters
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) Message 5: 4444 characters
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) Total: 6 messages, 15752 characters
(Process-1113 agentlightning.runner) [Worker 0 | Rollout rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284] Completed in 1505.21s. Triplet length: 4. Reward: 0.0
I guess the server raise timeout error bcz agent takes too much time to finish task. I suggest that if time out, just ignore that rollout.
BTW, is there any wechat group or rednote group?
Metadata
Metadata
Assignees
Labels
No labels