Skip to content

Conversation

@ultmaster
Copy link
Contributor

No description provided.

@ultmaster ultmaster marked this pull request as ready for review August 5, 2025 16:31
Copilot AI review requested due to automatic review settings August 5, 2025 16:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the trainer code to fix bugs in v0.1 by reorganizing the training loop structure and adding configurable timeout support for LLM requests.

  • Extracted core training logic into a separate _train_step method to improve modularity and variable lifecycle management
  • Added configurable llm_timeout_seconds parameter to AgentModeDaemon to replace hardcoded timeout values
  • Fixed comment inconsistency by changing "testing" timer label to "validate" for clarity

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
agentlightning/verl/trainer.py Extracted training step logic into _train_step method and reorganized the main training loop
agentlightning/verl/daemon.py Added configurable llm_timeout_seconds parameter and updated variable references
agentlightning/trainer.py Added TODO comment for agent match configuration placement

return test_metrics

def _train_step(self, batch_dict: dict) -> dict:
# Isolate in a separate method to automatically recycle the variables before validation.
Copy link

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _train_step method lacks documentation explaining its purpose, parameters, and return value. Consider adding a docstring that describes the method's role in isolating training logic and automatically recycling variables.

Suggested change
# Isolate in a separate method to automatically recycle the variables before validation.
"""
Executes a single training step on a batch of data, isolating the training logic and
automatically recycling variables before validation.
This method processes a batch of data, performs forward and backward passes, updates model
parameters, and computes relevant training metrics. By isolating the training logic in this
method, variables are automatically recycled, reducing memory usage and potential side effects
before validation.
Args:
batch_dict (dict): A dictionary representing a single batch of training data, typically
produced by the dataloader and convertible to a DataProto object.
Returns:
dict: A dictionary containing computed training metrics for the batch.
"""

Copilot uses AI. Check for mistakes.

# training metrics
# train step
metrics = self._train_step(batch_dict)
Copy link

Copilot AI Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics variable is initialized as an empty dictionary on line 309 but then completely overwritten by _train_step(batch_dict) on line 314. This overwrites any previous metrics and ignores the timing_raw dictionary that was initialized. The timing metrics from validation should be preserved and merged with training metrics.

Copilot uses AI. Check for mistakes.
@ultmaster ultmaster merged commit a71e754 into main Aug 5, 2025
8 checks passed
@ultmaster ultmaster deleted the update-sql-agent branch October 28, 2025 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants