Fix trainer bugs in v0.1 #24

ultmaster · 2025-08-04T10:19:24Z

No description provided.

…ng into update-sql-agent

Copilot

Pull Request Overview

This PR refactors the trainer code to fix bugs in v0.1 by reorganizing the training loop structure and adding configurable timeout support for LLM requests.

Extracted core training logic into a separate _train_step method to improve modularity and variable lifecycle management
Added configurable llm_timeout_seconds parameter to AgentModeDaemon to replace hardcoded timeout values
Fixed comment inconsistency by changing "testing" timer label to "validate" for clarity

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
agentlightning/verl/trainer.py	Extracted training step logic into `_train_step` method and reorganized the main training loop
agentlightning/verl/daemon.py	Added configurable `llm_timeout_seconds` parameter and updated variable references
agentlightning/trainer.py	Added TODO comment for agent match configuration placement

Copilot · 2025-08-05T16:32:25Z

agentlightning/verl/trainer.py

        return test_metrics

+    def _train_step(self, batch_dict: dict) -> dict:
+        # Isolate in a separate method to automatically recycle the variables before validation.


The _train_step method lacks documentation explaining its purpose, parameters, and return value. Consider adding a docstring that describes the method's role in isolating training logic and automatically recycling variables.

Suggested change

# Isolate in a separate method to automatically recycle the variables before validation.

"""

Executes a single training step on a batch of data, isolating the training logic and

automatically recycling variables before validation.

This method processes a batch of data, performs forward and backward passes, updates model

parameters, and computes relevant training metrics. By isolating the training logic in this

method, variables are automatically recycled, reducing memory usage and potential side effects

before validation.

Args:

batch_dict (dict): A dictionary representing a single batch of training data, typically

produced by the dataloader and convertible to a DataProto object.

Returns:

dict: A dictionary containing computed training metrics for the batch.

"""

Copilot · 2025-08-05T16:32:26Z

agentlightning/verl/trainer.py

-
-                # training metrics
+                # train step
+                metrics = self._train_step(batch_dict)


The metrics variable is initialized as an empty dictionary on line 309 but then completely overwritten by _train_step(batch_dict) on line 314. This overwrites any previous metrics and ignores the timing_raw dictionary that was initialized. The timing metrics from validation should be preserved and merged with training metrics.

…te-sql-agent

ultmaster and others added 11 commits August 1, 2025 09:45

remove dev data

74d8092

try to fix memory issue in trainer

26e4914

.

cbf6417

.

b9e35db

.

f45766b

.

1b34a48

.

8fa8802

fix agent match in trainer

96c3010

Merge branch 'update-sql-agent' of github.com:microsoft/agent-lightni…

b506331

…ng into update-sql-agent

update llm timeout

482c452

revert sql agent

e6eb459

ultmaster marked this pull request as ready for review August 5, 2025 16:31

Copilot AI review requested due to automatic review settings August 5, 2025 16:31

Copilot AI reviewed Aug 5, 2025

View reviewed changes

Merge branch 'main' of github.com:microsoft/agent-lightning into upda…

434ffa6

…te-sql-agent

ultmaster merged commit a71e754 into main Aug 5, 2025
8 checks passed

ultmaster deleted the update-sql-agent branch October 28, 2025 01:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix trainer bugs in v0.1 #24

Fix trainer bugs in v0.1 #24

ultmaster commented Aug 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 5, 2025

Uh oh!

Copilot AI Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        # Isolate in a separate method to automatically recycle the variables before validation.
+        """
+        Executes a single training step on a batch of data, isolating the training logic and
+        automatically recycling variables before validation.
+        This method processes a batch of data, performs forward and backward passes, updates model
+        parameters, and computes relevant training metrics. By isolating the training logic in this
+        method, variables are automatically recycled, reducing memory usage and potential side effects
+        before validation.
+        Args:
+            batch_dict (dict): A dictionary representing a single batch of training data, typically
+                produced by the dataloader and convertible to a DataProto object.
+        Returns:
+            dict: A dictionary containing computed training metrics for the batch.
+        """

Fix trainer bugs in v0.1 #24

Fix trainer bugs in v0.1 #24

Conversation

ultmaster commented Aug 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants