-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[recipe] fix: make LangGraph agent example runnable out-of-the-box #3029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…der; apply perf tuning from GSPO recipe
…aults and SLURM sbatch example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly improves the LangGraph agent example by making it runnable out-of-the-box. The changes, including adding CLI arguments, SLURM support, and more robust error handling, are well-executed. My review focuses on enhancing portability and maintainability. I've identified a critical portability issue in the run script due to a hardcoded network configuration and a high-severity maintainability issue in the chat model from duplicated error-handling code. Addressing these points will further solidify the example's robustness and ease of use across different environments.
| export NCCL_IBEXT_DISABLE=1 | ||
| export NCCL_NVLS_ENABLE=1 | ||
| export NCCL_IB_HCA=mlx5 | ||
| export UCX_NET_DEVICES=mlx5_0:1,mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UCX_NET_DEVICES variable is hardcoded to a specific configuration with 8 Mellanox network interfaces. This will cause the script to fail on most systems that do not match this exact hardware setup, which undermines the goal of making the example runnable out-of-the-box. To ensure portability, this line should be removed to allow UCX to use default settings, or the device list should be generated dynamically based on the available hardware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setting is copied from the GSPO recipe (see test_gspo_3b_math.sh:L26) and has worked across multiple hardware configs in my tests. I’m not a UCX expert and open to feedback if there’s a more portable approach.
…olcengine#3029) ### What does this PR do? Fixes the LangGraph agent recipe so it runs out-of-the-box across different environments. The original example had undefined variables and brittle error handling that caused failures. This PR makes it portable, robust, and self-contained. No breaking API changes. ### Checklist Before Starting * [x] Search for similar PRs: [https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++\&type=pullrequests\&state=open](https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open) * [x] Format PR title as `[recipe] fix: make LangGraph agent example runnable out-of-the-box` * `{modules}`: recipe * `{type}`: fix * No breaking API changes ### Test **✅ End-to-end validation:** ```bash # 1. Generate dataset (parameterized) python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100 # 2. Run training (no modifications needed) bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh # 3. SLURM submission (headers included) sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` **Note on `GPUS_PER_NODE` and `NNODES`:** - `GPUS_PER_NODE`: GPUs per node. Detection order: `SLURM_GPUS_ON_NODE` (if set) → `GPUS_PER_NODE` → `2`. - `NNODES`: number of nodes. Detection order: `SLURM_JOB_NUM_NODES` (if set) → `NNODES` → `1`. - Total GPUs = `GPUS_PER_NODE × NNODES` (must be ≥ 2). Local override (no `SLURM_*` set): ```bash GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` **Results:** * Model converged to 100% validation accuracy (`val-core/lighteval/MATH/reward/mean@4: 1.0`) * Stable metrics: policy loss, entropy, critic scores all normal * No crashes or hangs during run * Robust handling of malformed tool-call JSON (logs warnings) * Model path fallback works when local model missing * SLURM detection + fallbacks confirmed <img width="3066" height="1288" alt="math_expression_tool – Weights & Biases" src="https://github.com/user-attachments/assets/f08d5799-f9ce-44a2-8fb2-19c7c401c248" /> ### API and Usage Example **No breaking API changes.** Dataset generator now has a CLI interface: ```bash # Defaults: 5000 train, 500 test → data/math_expression_tool/ python recipe/langgraph_agent/example/create_dataset.py # Custom sizes & output dir python recipe/langgraph_agent/example/create_dataset.py \ --train_size 10000 \ --test_size 1000 \ --output_dir my_custom_path # Training bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh # SLURM sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` ### Design & Code Changes **Core runability fixes:** * `run_qwen2.5_3b.sh`: * Replace undefined ARNOLD\_\* vars with SLURM detection + fallbacks * Fix dataset paths * Add HF hub model fallback * Apply performance tuning from GSPO recipe * `chat_model.py`: Harden tool-call parsing for malformed JSON * `create_dataset.py`: Add CLI args (`--train_size`, `--test_size`, `--output_dir`) with defaults **Docs & polish:** * Update `README.md` with CLI params and SLURM example * Sort imports to satisfy ruff linting **Impact:** Example now works out-of-the-box in local and cluster environments without edits. ### Checklist Before Submitting * [x] Read the [[Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md) * [x] Pre-commit checks: `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` * [x] Documentation updated (`README.md`) * [x] Manual end-to-end test with convergence results * [x] CI request to be sent in Slack once PR is opened
…olcengine#3029) ### What does this PR do? Fixes the LangGraph agent recipe so it runs out-of-the-box across different environments. The original example had undefined variables and brittle error handling that caused failures. This PR makes it portable, robust, and self-contained. No breaking API changes. ### Checklist Before Starting * [x] Search for similar PRs: [https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++\&type=pullrequests\&state=open](https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open) * [x] Format PR title as `[recipe] fix: make LangGraph agent example runnable out-of-the-box` * `{modules}`: recipe * `{type}`: fix * No breaking API changes ### Test **✅ End-to-end validation:** ```bash # 1. Generate dataset (parameterized) python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100 # 2. Run training (no modifications needed) bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh # 3. SLURM submission (headers included) sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` **Note on `GPUS_PER_NODE` and `NNODES`:** - `GPUS_PER_NODE`: GPUs per node. Detection order: `SLURM_GPUS_ON_NODE` (if set) → `GPUS_PER_NODE` → `2`. - `NNODES`: number of nodes. Detection order: `SLURM_JOB_NUM_NODES` (if set) → `NNODES` → `1`. - Total GPUs = `GPUS_PER_NODE × NNODES` (must be ≥ 2). Local override (no `SLURM_*` set): ```bash GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` **Results:** * Model converged to 100% validation accuracy (`val-core/lighteval/MATH/reward/mean@4: 1.0`) * Stable metrics: policy loss, entropy, critic scores all normal * No crashes or hangs during run * Robust handling of malformed tool-call JSON (logs warnings) * Model path fallback works when local model missing * SLURM detection + fallbacks confirmed <img width="3066" height="1288" alt="math_expression_tool – Weights & Biases" src="https://github.com/user-attachments/assets/f08d5799-f9ce-44a2-8fb2-19c7c401c248" /> ### API and Usage Example **No breaking API changes.** Dataset generator now has a CLI interface: ```bash # Defaults: 5000 train, 500 test → data/math_expression_tool/ python recipe/langgraph_agent/example/create_dataset.py # Custom sizes & output dir python recipe/langgraph_agent/example/create_dataset.py \ --train_size 10000 \ --test_size 1000 \ --output_dir my_custom_path # Training bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh # SLURM sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` ### Design & Code Changes **Core runability fixes:** * `run_qwen2.5_3b.sh`: * Replace undefined ARNOLD\_\* vars with SLURM detection + fallbacks * Fix dataset paths * Add HF hub model fallback * Apply performance tuning from GSPO recipe * `chat_model.py`: Harden tool-call parsing for malformed JSON * `create_dataset.py`: Add CLI args (`--train_size`, `--test_size`, `--output_dir`) with defaults **Docs & polish:** * Update `README.md` with CLI params and SLURM example * Sort imports to satisfy ruff linting **Impact:** Example now works out-of-the-box in local and cluster environments without edits. ### Checklist Before Submitting * [x] Read the [[Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md) * [x] Pre-commit checks: `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` * [x] Documentation updated (`README.md`) * [x] Manual end-to-end test with convergence results * [x] CI request to be sent in Slack once PR is opened
…olcengine#3029) ### What does this PR do? Fixes the LangGraph agent recipe so it runs out-of-the-box across different environments. The original example had undefined variables and brittle error handling that caused failures. This PR makes it portable, robust, and self-contained. No breaking API changes. ### Checklist Before Starting * [x] Search for similar PRs: [https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++\&type=pullrequests\&state=open](https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open) * [x] Format PR title as `[recipe] fix: make LangGraph agent example runnable out-of-the-box` * `{modules}`: recipe * `{type}`: fix * No breaking API changes ### Test **✅ End-to-end validation:** ```bash # 1. Generate dataset (parameterized) python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100 # 2. Run training (no modifications needed) bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh # 3. SLURM submission (headers included) sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` **Note on `GPUS_PER_NODE` and `NNODES`:** - `GPUS_PER_NODE`: GPUs per node. Detection order: `SLURM_GPUS_ON_NODE` (if set) → `GPUS_PER_NODE` → `2`. - `NNODES`: number of nodes. Detection order: `SLURM_JOB_NUM_NODES` (if set) → `NNODES` → `1`. - Total GPUs = `GPUS_PER_NODE × NNODES` (must be ≥ 2). Local override (no `SLURM_*` set): ```bash GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` **Results:** * Model converged to 100% validation accuracy (`val-core/lighteval/MATH/reward/mean@4: 1.0`) * Stable metrics: policy loss, entropy, critic scores all normal * No crashes or hangs during run * Robust handling of malformed tool-call JSON (logs warnings) * Model path fallback works when local model missing * SLURM detection + fallbacks confirmed <img width="3066" height="1288" alt="math_expression_tool – Weights & Biases" src="https://github.com/user-attachments/assets/f08d5799-f9ce-44a2-8fb2-19c7c401c248" /> ### API and Usage Example **No breaking API changes.** Dataset generator now has a CLI interface: ```bash # Defaults: 5000 train, 500 test → data/math_expression_tool/ python recipe/langgraph_agent/example/create_dataset.py # Custom sizes & output dir python recipe/langgraph_agent/example/create_dataset.py \ --train_size 10000 \ --test_size 1000 \ --output_dir my_custom_path # Training bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh # SLURM sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` ### Design & Code Changes **Core runability fixes:** * `run_qwen2.5_3b.sh`: * Replace undefined ARNOLD\_\* vars with SLURM detection + fallbacks * Fix dataset paths * Add HF hub model fallback * Apply performance tuning from GSPO recipe * `chat_model.py`: Harden tool-call parsing for malformed JSON * `create_dataset.py`: Add CLI args (`--train_size`, `--test_size`, `--output_dir`) with defaults **Docs & polish:** * Update `README.md` with CLI params and SLURM example * Sort imports to satisfy ruff linting **Impact:** Example now works out-of-the-box in local and cluster environments without edits. ### Checklist Before Submitting * [x] Read the [[Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md) * [x] Pre-commit checks: `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` * [x] Documentation updated (`README.md`) * [x] Manual end-to-end test with convergence results * [x] CI request to be sent in Slack once PR is opened
…olcengine#3029) ### What does this PR do? Fixes the LangGraph agent recipe so it runs out-of-the-box across different environments. The original example had undefined variables and brittle error handling that caused failures. This PR makes it portable, robust, and self-contained. No breaking API changes. ### Checklist Before Starting * [x] Search for similar PRs: [https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++\&type=pullrequests\&state=open](https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open) * [x] Format PR title as `[recipe] fix: make LangGraph agent example runnable out-of-the-box` * `{modules}`: recipe * `{type}`: fix * No breaking API changes ### Test **✅ End-to-end validation:** ```bash # 1. Generate dataset (parameterized) python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100 # 2. Run training (no modifications needed) bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh # 3. SLURM submission (headers included) sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` **Note on `GPUS_PER_NODE` and `NNODES`:** - `GPUS_PER_NODE`: GPUs per node. Detection order: `SLURM_GPUS_ON_NODE` (if set) → `GPUS_PER_NODE` → `2`. - `NNODES`: number of nodes. Detection order: `SLURM_JOB_NUM_NODES` (if set) → `NNODES` → `1`. - Total GPUs = `GPUS_PER_NODE × NNODES` (must be ≥ 2). Local override (no `SLURM_*` set): ```bash GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` **Results:** * Model converged to 100% validation accuracy (`val-core/lighteval/MATH/reward/mean@4: 1.0`) * Stable metrics: policy loss, entropy, critic scores all normal * No crashes or hangs during run * Robust handling of malformed tool-call JSON (logs warnings) * Model path fallback works when local model missing * SLURM detection + fallbacks confirmed <img width="3066" height="1288" alt="math_expression_tool – Weights & Biases" src="https://github.com/user-attachments/assets/f08d5799-f9ce-44a2-8fb2-19c7c401c248" /> ### API and Usage Example **No breaking API changes.** Dataset generator now has a CLI interface: ```bash # Defaults: 5000 train, 500 test → data/math_expression_tool/ python recipe/langgraph_agent/example/create_dataset.py # Custom sizes & output dir python recipe/langgraph_agent/example/create_dataset.py \ --train_size 10000 \ --test_size 1000 \ --output_dir my_custom_path # Training bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh # SLURM sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` ### Design & Code Changes **Core runability fixes:** * `run_qwen2.5_3b.sh`: * Replace undefined ARNOLD\_\* vars with SLURM detection + fallbacks * Fix dataset paths * Add HF hub model fallback * Apply performance tuning from GSPO recipe * `chat_model.py`: Harden tool-call parsing for malformed JSON * `create_dataset.py`: Add CLI args (`--train_size`, `--test_size`, `--output_dir`) with defaults **Docs & polish:** * Update `README.md` with CLI params and SLURM example * Sort imports to satisfy ruff linting **Impact:** Example now works out-of-the-box in local and cluster environments without edits. ### Checklist Before Submitting * [x] Read the [[Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md) * [x] Pre-commit checks: `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` * [x] Documentation updated (`README.md`) * [x] Manual end-to-end test with convergence results * [x] CI request to be sent in Slack once PR is opened
…olcengine#3029) ### What does this PR do? Fixes the LangGraph agent recipe so it runs out-of-the-box across different environments. The original example had undefined variables and brittle error handling that caused failures. This PR makes it portable, robust, and self-contained. No breaking API changes. ### Checklist Before Starting * [x] Search for similar PRs: [https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++\&type=pullrequests\&state=open](https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open) * [x] Format PR title as `[recipe] fix: make LangGraph agent example runnable out-of-the-box` * `{modules}`: recipe * `{type}`: fix * No breaking API changes ### Test **✅ End-to-end validation:** ```bash # 1. Generate dataset (parameterized) python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100 # 2. Run training (no modifications needed) bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh # 3. SLURM submission (headers included) sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` **Note on `GPUS_PER_NODE` and `NNODES`:** - `GPUS_PER_NODE`: GPUs per node. Detection order: `SLURM_GPUS_ON_NODE` (if set) → `GPUS_PER_NODE` → `2`. - `NNODES`: number of nodes. Detection order: `SLURM_JOB_NUM_NODES` (if set) → `NNODES` → `1`. - Total GPUs = `GPUS_PER_NODE × NNODES` (must be ≥ 2). Local override (no `SLURM_*` set): ```bash GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` **Results:** * Model converged to 100% validation accuracy (`val-core/lighteval/MATH/reward/mean@4: 1.0`) * Stable metrics: policy loss, entropy, critic scores all normal * No crashes or hangs during run * Robust handling of malformed tool-call JSON (logs warnings) * Model path fallback works when local model missing * SLURM detection + fallbacks confirmed <img width="3066" height="1288" alt="math_expression_tool – Weights & Biases" src="https://github.com/user-attachments/assets/f08d5799-f9ce-44a2-8fb2-19c7c401c248" /> ### API and Usage Example **No breaking API changes.** Dataset generator now has a CLI interface: ```bash # Defaults: 5000 train, 500 test → data/math_expression_tool/ python recipe/langgraph_agent/example/create_dataset.py # Custom sizes & output dir python recipe/langgraph_agent/example/create_dataset.py \ --train_size 10000 \ --test_size 1000 \ --output_dir my_custom_path # Training bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh # SLURM sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` ### Design & Code Changes **Core runability fixes:** * `run_qwen2.5_3b.sh`: * Replace undefined ARNOLD\_\* vars with SLURM detection + fallbacks * Fix dataset paths * Add HF hub model fallback * Apply performance tuning from GSPO recipe * `chat_model.py`: Harden tool-call parsing for malformed JSON * `create_dataset.py`: Add CLI args (`--train_size`, `--test_size`, `--output_dir`) with defaults **Docs & polish:** * Update `README.md` with CLI params and SLURM example * Sort imports to satisfy ruff linting **Impact:** Example now works out-of-the-box in local and cluster environments without edits. ### Checklist Before Submitting * [x] Read the [[Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md) * [x] Pre-commit checks: `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` * [x] Documentation updated (`README.md`) * [x] Manual end-to-end test with convergence results * [x] CI request to be sent in Slack once PR is opened
…olcengine#3029) ### What does this PR do? Fixes the LangGraph agent recipe so it runs out-of-the-box across different environments. The original example had undefined variables and brittle error handling that caused failures. This PR makes it portable, robust, and self-contained. No breaking API changes. ### Checklist Before Starting * [x] Search for similar PRs: [https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++\&type=pullrequests\&state=open](https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open) * [x] Format PR title as `[recipe] fix: make LangGraph agent example runnable out-of-the-box` * `{modules}`: recipe * `{type}`: fix * No breaking API changes ### Test **✅ End-to-end validation:** ```bash # 1. Generate dataset (parameterized) python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100 # 2. Run training (no modifications needed) bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh # 3. SLURM submission (headers included) sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` **Note on `GPUS_PER_NODE` and `NNODES`:** - `GPUS_PER_NODE`: GPUs per node. Detection order: `SLURM_GPUS_ON_NODE` (if set) → `GPUS_PER_NODE` → `2`. - `NNODES`: number of nodes. Detection order: `SLURM_JOB_NUM_NODES` (if set) → `NNODES` → `1`. - Total GPUs = `GPUS_PER_NODE × NNODES` (must be ≥ 2). Local override (no `SLURM_*` set): ```bash GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` **Results:** * Model converged to 100% validation accuracy (`val-core/lighteval/MATH/reward/mean@4: 1.0`) * Stable metrics: policy loss, entropy, critic scores all normal * No crashes or hangs during run * Robust handling of malformed tool-call JSON (logs warnings) * Model path fallback works when local model missing * SLURM detection + fallbacks confirmed <img width="3066" height="1288" alt="math_expression_tool – Weights & Biases" src="https://github.com/user-attachments/assets/f08d5799-f9ce-44a2-8fb2-19c7c401c248" /> ### API and Usage Example **No breaking API changes.** Dataset generator now has a CLI interface: ```bash # Defaults: 5000 train, 500 test → data/math_expression_tool/ python recipe/langgraph_agent/example/create_dataset.py # Custom sizes & output dir python recipe/langgraph_agent/example/create_dataset.py \ --train_size 10000 \ --test_size 1000 \ --output_dir my_custom_path # Training bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh # SLURM sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh ``` ### Design & Code Changes **Core runability fixes:** * `run_qwen2.5_3b.sh`: * Replace undefined ARNOLD\_\* vars with SLURM detection + fallbacks * Fix dataset paths * Add HF hub model fallback * Apply performance tuning from GSPO recipe * `chat_model.py`: Harden tool-call parsing for malformed JSON * `create_dataset.py`: Add CLI args (`--train_size`, `--test_size`, `--output_dir`) with defaults **Docs & polish:** * Update `README.md` with CLI params and SLURM example * Sort imports to satisfy ruff linting **Impact:** Example now works out-of-the-box in local and cluster environments without edits. ### Checklist Before Submitting * [x] Read the [[Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md) * [x] Pre-commit checks: `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` * [x] Documentation updated (`README.md`) * [x] Manual end-to-end test with convergence results * [x] CI request to be sent in Slack once PR is opened
What does this PR do?
Fixes the LangGraph agent recipe so it runs out-of-the-box across different environments. The original example had undefined variables and brittle error handling that caused failures. This PR makes it portable, robust, and self-contained. No breaking API changes.
Checklist Before Starting
Search for similar PRs: https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open
Format PR title as
[recipe] fix: make LangGraph agent example runnable out-of-the-box{modules}: recipe{type}: fixTest
✅ End-to-end validation:
Note on
GPUS_PER_NODEandNNODES:GPUS_PER_NODE: GPUs per node.Detection order:
SLURM_GPUS_ON_NODE(if set) →GPUS_PER_NODE→2.NNODES: number of nodes.Detection order:
SLURM_JOB_NUM_NODES(if set) →NNODES→1.GPUS_PER_NODE × NNODES(must be ≥ 2).Local override (no
SLURM_*set):Results:
val-core/lighteval/MATH/reward/mean@4: 1.0)API and Usage Example
No breaking API changes. Dataset generator now has a CLI interface:
Design & Code Changes
Core runability fixes:
run_qwen2.5_3b.sh:chat_model.py: Harden tool-call parsing for malformed JSONcreate_dataset.py: Add CLI args (--train_size,--test_size,--output_dir) with defaultsDocs & polish:
README.mdwith CLI params and SLURM exampleImpact: Example now works out-of-the-box in local and cluster environments without edits.
Checklist Before Submitting
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysREADME.md)