-
Notifications
You must be signed in to change notification settings - Fork 515
Fix trainer bugs in v0.1 #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ng into update-sql-agent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the trainer code to fix bugs in v0.1 by reorganizing the training loop structure and adding configurable timeout support for LLM requests.
- Extracted core training logic into a separate
_train_stepmethod to improve modularity and variable lifecycle management - Added configurable
llm_timeout_secondsparameter toAgentModeDaemonto replace hardcoded timeout values - Fixed comment inconsistency by changing "testing" timer label to "validate" for clarity
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| agentlightning/verl/trainer.py | Extracted training step logic into _train_step method and reorganized the main training loop |
| agentlightning/verl/daemon.py | Added configurable llm_timeout_seconds parameter and updated variable references |
| agentlightning/trainer.py | Added TODO comment for agent match configuration placement |
| return test_metrics | ||
|
|
||
| def _train_step(self, batch_dict: dict) -> dict: | ||
| # Isolate in a separate method to automatically recycle the variables before validation. |
Copilot
AI
Aug 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _train_step method lacks documentation explaining its purpose, parameters, and return value. Consider adding a docstring that describes the method's role in isolating training logic and automatically recycling variables.
| # Isolate in a separate method to automatically recycle the variables before validation. | |
| """ | |
| Executes a single training step on a batch of data, isolating the training logic and | |
| automatically recycling variables before validation. | |
| This method processes a batch of data, performs forward and backward passes, updates model | |
| parameters, and computes relevant training metrics. By isolating the training logic in this | |
| method, variables are automatically recycled, reducing memory usage and potential side effects | |
| before validation. | |
| Args: | |
| batch_dict (dict): A dictionary representing a single batch of training data, typically | |
| produced by the dataloader and convertible to a DataProto object. | |
| Returns: | |
| dict: A dictionary containing computed training metrics for the batch. | |
| """ |
|
|
||
| # training metrics | ||
| # train step | ||
| metrics = self._train_step(batch_dict) |
Copilot
AI
Aug 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics variable is initialized as an empty dictionary on line 309 but then completely overwritten by _train_step(batch_dict) on line 314. This overwrites any previous metrics and ignores the timing_raw dictionary that was initialized. The timing metrics from validation should be preserved and merged with training metrics.
No description provided.