Skip to content

Conversation

@philippnormann
Copy link
Contributor

@philippnormann philippnormann commented Aug 12, 2025

What does this PR do?

Fixes the LangGraph agent recipe so it runs out-of-the-box across different environments. The original example had undefined variables and brittle error handling that caused failures. This PR makes it portable, robust, and self-contained. No breaking API changes.

Checklist Before Starting

Test

✅ End-to-end validation:

# 1. Generate dataset (parameterized)
python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100

# 2. Run training (no modifications needed)
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# 3. SLURM submission (headers included)
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh

Note on GPUS_PER_NODE and NNODES:

  • GPUS_PER_NODE: GPUs per node.
    Detection order: SLURM_GPUS_ON_NODE (if set) → GPUS_PER_NODE2.
  • NNODES: number of nodes.
    Detection order: SLURM_JOB_NUM_NODES (if set) → NNODES1.
  • Total GPUs = GPUS_PER_NODE × NNODES (must be ≥ 2).

Local override (no SLURM_* set):

GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

Results:

  • Model converged to 100% validation accuracy (val-core/lighteval/MATH/reward/mean@4: 1.0)
  • Stable metrics: policy loss, entropy, critic scores all normal
  • No crashes or hangs during run
  • Robust handling of malformed tool-call JSON (logs warnings)
  • Model path fallback works when local model missing
  • SLURM detection + fallbacks confirmed
math_expression_tool – Weights & Biases

API and Usage Example

No breaking API changes. Dataset generator now has a CLI interface:

# Defaults: 5000 train, 500 test → data/math_expression_tool/
python recipe/langgraph_agent/example/create_dataset.py

# Custom sizes & output dir
python recipe/langgraph_agent/example/create_dataset.py \
  --train_size 10000 \
  --test_size 1000 \
  --output_dir my_custom_path

# Training
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# SLURM
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh

Design & Code Changes

Core runability fixes:

  • run_qwen2.5_3b.sh:

    • Replace undefined ARNOLD_* vars with SLURM detection + fallbacks
    • Fix dataset paths
    • Add HF hub model fallback
    • Apply performance tuning from GSPO recipe
  • chat_model.py: Harden tool-call parsing for malformed JSON

  • create_dataset.py: Add CLI args (--train_size, --test_size, --output_dir) with defaults

Docs & polish:

  • Update README.md with CLI params and SLURM example
  • Sort imports to satisfy ruff linting

Impact: Example now works out-of-the-box in local and cluster environments without edits.

Checklist Before Submitting

@CLAassistant
Copy link

CLAassistant commented Aug 12, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the LangGraph agent example by making it runnable out-of-the-box. The changes, including adding CLI arguments, SLURM support, and more robust error handling, are well-executed. My review focuses on enhancing portability and maintainability. I've identified a critical portability issue in the run script due to a hardcoded network configuration and a high-severity maintainability issue in the chat model from duplicated error-handling code. Addressing these points will further solidify the example's robustness and ease of use across different environments.

export NCCL_IBEXT_DISABLE=1
export NCCL_NVLS_ENABLE=1
export NCCL_IB_HCA=mlx5
export UCX_NET_DEVICES=mlx5_0:1,mlx5_1:1,mlx5_2:1,mlx5_3:1,mlx5_4:1,mlx5_5:1,mlx5_6:1,mlx5_7:1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The UCX_NET_DEVICES variable is hardcoded to a specific configuration with 8 Mellanox network interfaces. This will cause the script to fail on most systems that do not match this exact hardware setup, which undermines the goal of making the example runnable out-of-the-box. To ensure portability, this line should be removed to allow UCX to use default settings, or the device list should be generated dynamically based on the available hardware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting is copied from the GSPO recipe (see test_gspo_3b_math.sh:L26) and has worked across multiple hardware configs in my tests. I’m not a UCX expert and open to feedback if there’s a more portable approach.

@chenhaiq chenhaiq requested a review from wuxibin89 August 13, 2025 02:02
@wuxibin89 wuxibin89 merged commit 83cfc76 into volcengine:main Aug 13, 2025
8 checks passed
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Aug 15, 2025
…olcengine#3029)

### What does this PR do?

Fixes the LangGraph agent recipe so it runs out-of-the-box across
different environments. The original example had undefined variables and
brittle error handling that caused failures. This PR makes it portable,
robust, and self-contained. No breaking API changes.

### Checklist Before Starting

* [x] Search for similar PRs:
[https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++\&type=pullrequests\&state=open](https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open)
* [x] Format PR title as `[recipe] fix: make LangGraph agent example
runnable out-of-the-box`

  * `{modules}`: recipe
  * `{type}`: fix
  * No breaking API changes

### Test

**✅ End-to-end validation:**

```bash
# 1. Generate dataset (parameterized)
python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100

# 2. Run training (no modifications needed)
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# 3. SLURM submission (headers included)
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```

**Note on `GPUS_PER_NODE` and `NNODES`:**

- `GPUS_PER_NODE`: GPUs per node.  
Detection order: `SLURM_GPUS_ON_NODE` (if set) → `GPUS_PER_NODE` → `2`.
- `NNODES`: number of nodes.  
  Detection order: `SLURM_JOB_NUM_NODES` (if set) → `NNODES` → `1`.
- Total GPUs = `GPUS_PER_NODE × NNODES` (must be ≥ 2).

Local override (no `SLURM_*` set):
```bash
GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```
**Results:**

* Model converged to 100% validation accuracy
(`val-core/lighteval/MATH/reward/mean@4: 1.0`)
* Stable metrics: policy loss, entropy, critic scores all normal
* No crashes or hangs during run
* Robust handling of malformed tool-call JSON (logs warnings)
* Model path fallback works when local model missing
* SLURM detection + fallbacks confirmed

<img width="3066" height="1288" alt="math_expression_tool – Weights &
Biases"
src="https://github.com/user-attachments/assets/f08d5799-f9ce-44a2-8fb2-19c7c401c248"
/>

### API and Usage Example

**No breaking API changes.** Dataset generator now has a CLI interface:

```bash
# Defaults: 5000 train, 500 test → data/math_expression_tool/
python recipe/langgraph_agent/example/create_dataset.py

# Custom sizes & output dir
python recipe/langgraph_agent/example/create_dataset.py \
  --train_size 10000 \
  --test_size 1000 \
  --output_dir my_custom_path

# Training
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# SLURM
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```

### Design & Code Changes

**Core runability fixes:**

* `run_qwen2.5_3b.sh`:

  * Replace undefined ARNOLD\_\* vars with SLURM detection + fallbacks
  * Fix dataset paths
  * Add HF hub model fallback
  * Apply performance tuning from GSPO recipe
* `chat_model.py`: Harden tool-call parsing for malformed JSON
* `create_dataset.py`: Add CLI args (`--train_size`, `--test_size`,
`--output_dir`) with defaults

**Docs & polish:**

* Update `README.md` with CLI params and SLURM example
* Sort imports to satisfy ruff linting

**Impact:** Example now works out-of-the-box in local and cluster
environments without edits.

### Checklist Before Submitting

* [x] Read the [[Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)
* [x] Pre-commit checks: `pre-commit install && pre-commit run
--all-files --show-diff-on-failure --color=always`
* [x] Documentation updated (`README.md`)
* [x] Manual end-to-end test with convergence results
* [x] CI request to be sent in Slack once PR is opened
techkang pushed a commit to techkang/verl that referenced this pull request Aug 15, 2025
…olcengine#3029)

### What does this PR do?

Fixes the LangGraph agent recipe so it runs out-of-the-box across
different environments. The original example had undefined variables and
brittle error handling that caused failures. This PR makes it portable,
robust, and self-contained. No breaking API changes.

### Checklist Before Starting

* [x] Search for similar PRs:
[https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++\&type=pullrequests\&state=open](https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open)
* [x] Format PR title as `[recipe] fix: make LangGraph agent example
runnable out-of-the-box`

  * `{modules}`: recipe
  * `{type}`: fix
  * No breaking API changes

### Test

**✅ End-to-end validation:**

```bash
# 1. Generate dataset (parameterized)
python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100

# 2. Run training (no modifications needed)
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# 3. SLURM submission (headers included)
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```

**Note on `GPUS_PER_NODE` and `NNODES`:**

- `GPUS_PER_NODE`: GPUs per node.  
Detection order: `SLURM_GPUS_ON_NODE` (if set) → `GPUS_PER_NODE` → `2`.
- `NNODES`: number of nodes.  
  Detection order: `SLURM_JOB_NUM_NODES` (if set) → `NNODES` → `1`.
- Total GPUs = `GPUS_PER_NODE × NNODES` (must be ≥ 2).

Local override (no `SLURM_*` set):
```bash
GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```
**Results:**

* Model converged to 100% validation accuracy
(`val-core/lighteval/MATH/reward/mean@4: 1.0`)
* Stable metrics: policy loss, entropy, critic scores all normal
* No crashes or hangs during run
* Robust handling of malformed tool-call JSON (logs warnings)
* Model path fallback works when local model missing
* SLURM detection + fallbacks confirmed

<img width="3066" height="1288" alt="math_expression_tool – Weights &
Biases"
src="https://github.com/user-attachments/assets/f08d5799-f9ce-44a2-8fb2-19c7c401c248"
/>

### API and Usage Example

**No breaking API changes.** Dataset generator now has a CLI interface:

```bash
# Defaults: 5000 train, 500 test → data/math_expression_tool/
python recipe/langgraph_agent/example/create_dataset.py

# Custom sizes & output dir
python recipe/langgraph_agent/example/create_dataset.py \
  --train_size 10000 \
  --test_size 1000 \
  --output_dir my_custom_path

# Training
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# SLURM
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```

### Design & Code Changes

**Core runability fixes:**

* `run_qwen2.5_3b.sh`:

  * Replace undefined ARNOLD\_\* vars with SLURM detection + fallbacks
  * Fix dataset paths
  * Add HF hub model fallback
  * Apply performance tuning from GSPO recipe
* `chat_model.py`: Harden tool-call parsing for malformed JSON
* `create_dataset.py`: Add CLI args (`--train_size`, `--test_size`,
`--output_dir`) with defaults

**Docs & polish:**

* Update `README.md` with CLI params and SLURM example
* Sort imports to satisfy ruff linting

**Impact:** Example now works out-of-the-box in local and cluster
environments without edits.

### Checklist Before Submitting

* [x] Read the [[Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)
* [x] Pre-commit checks: `pre-commit install && pre-commit run
--all-files --show-diff-on-failure --color=always`
* [x] Documentation updated (`README.md`)
* [x] Manual end-to-end test with convergence results
* [x] CI request to be sent in Slack once PR is opened
ChangyiYang pushed a commit to SwordFaith/verl that referenced this pull request Aug 16, 2025
…olcengine#3029)

### What does this PR do?

Fixes the LangGraph agent recipe so it runs out-of-the-box across
different environments. The original example had undefined variables and
brittle error handling that caused failures. This PR makes it portable,
robust, and self-contained. No breaking API changes.

### Checklist Before Starting

* [x] Search for similar PRs:
[https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++\&type=pullrequests\&state=open](https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open)
* [x] Format PR title as `[recipe] fix: make LangGraph agent example
runnable out-of-the-box`

  * `{modules}`: recipe
  * `{type}`: fix
  * No breaking API changes

### Test

**✅ End-to-end validation:**

```bash
# 1. Generate dataset (parameterized)
python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100

# 2. Run training (no modifications needed)
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# 3. SLURM submission (headers included)
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```

**Note on `GPUS_PER_NODE` and `NNODES`:**

- `GPUS_PER_NODE`: GPUs per node.  
Detection order: `SLURM_GPUS_ON_NODE` (if set) → `GPUS_PER_NODE` → `2`.
- `NNODES`: number of nodes.  
  Detection order: `SLURM_JOB_NUM_NODES` (if set) → `NNODES` → `1`.
- Total GPUs = `GPUS_PER_NODE × NNODES` (must be ≥ 2).

Local override (no `SLURM_*` set):
```bash
GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```
**Results:**

* Model converged to 100% validation accuracy
(`val-core/lighteval/MATH/reward/mean@4: 1.0`)
* Stable metrics: policy loss, entropy, critic scores all normal
* No crashes or hangs during run
* Robust handling of malformed tool-call JSON (logs warnings)
* Model path fallback works when local model missing
* SLURM detection + fallbacks confirmed

<img width="3066" height="1288" alt="math_expression_tool – Weights &
Biases"
src="https://github.com/user-attachments/assets/f08d5799-f9ce-44a2-8fb2-19c7c401c248"
/>

### API and Usage Example

**No breaking API changes.** Dataset generator now has a CLI interface:

```bash
# Defaults: 5000 train, 500 test → data/math_expression_tool/
python recipe/langgraph_agent/example/create_dataset.py

# Custom sizes & output dir
python recipe/langgraph_agent/example/create_dataset.py \
  --train_size 10000 \
  --test_size 1000 \
  --output_dir my_custom_path

# Training
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# SLURM
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```

### Design & Code Changes

**Core runability fixes:**

* `run_qwen2.5_3b.sh`:

  * Replace undefined ARNOLD\_\* vars with SLURM detection + fallbacks
  * Fix dataset paths
  * Add HF hub model fallback
  * Apply performance tuning from GSPO recipe
* `chat_model.py`: Harden tool-call parsing for malformed JSON
* `create_dataset.py`: Add CLI args (`--train_size`, `--test_size`,
`--output_dir`) with defaults

**Docs & polish:**

* Update `README.md` with CLI params and SLURM example
* Sort imports to satisfy ruff linting

**Impact:** Example now works out-of-the-box in local and cluster
environments without edits.

### Checklist Before Submitting

* [x] Read the [[Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)
* [x] Pre-commit checks: `pre-commit install && pre-commit run
--all-files --show-diff-on-failure --color=always`
* [x] Documentation updated (`README.md`)
* [x] Manual end-to-end test with convergence results
* [x] CI request to be sent in Slack once PR is opened
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
…olcengine#3029)

### What does this PR do?

Fixes the LangGraph agent recipe so it runs out-of-the-box across
different environments. The original example had undefined variables and
brittle error handling that caused failures. This PR makes it portable,
robust, and self-contained. No breaking API changes.

### Checklist Before Starting

* [x] Search for similar PRs:
[https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++\&type=pullrequests\&state=open](https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open)
* [x] Format PR title as `[recipe] fix: make LangGraph agent example
runnable out-of-the-box`

  * `{modules}`: recipe
  * `{type}`: fix
  * No breaking API changes

### Test

**✅ End-to-end validation:**

```bash
# 1. Generate dataset (parameterized)
python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100

# 2. Run training (no modifications needed)
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# 3. SLURM submission (headers included)
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```

**Note on `GPUS_PER_NODE` and `NNODES`:**

- `GPUS_PER_NODE`: GPUs per node.  
Detection order: `SLURM_GPUS_ON_NODE` (if set) → `GPUS_PER_NODE` → `2`.
- `NNODES`: number of nodes.  
  Detection order: `SLURM_JOB_NUM_NODES` (if set) → `NNODES` → `1`.
- Total GPUs = `GPUS_PER_NODE × NNODES` (must be ≥ 2).

Local override (no `SLURM_*` set):
```bash
GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```
**Results:**

* Model converged to 100% validation accuracy
(`val-core/lighteval/MATH/reward/mean@4: 1.0`)
* Stable metrics: policy loss, entropy, critic scores all normal
* No crashes or hangs during run
* Robust handling of malformed tool-call JSON (logs warnings)
* Model path fallback works when local model missing
* SLURM detection + fallbacks confirmed

<img width="3066" height="1288" alt="math_expression_tool – Weights &
Biases"
src="https://github.com/user-attachments/assets/f08d5799-f9ce-44a2-8fb2-19c7c401c248"
/>

### API and Usage Example

**No breaking API changes.** Dataset generator now has a CLI interface:

```bash
# Defaults: 5000 train, 500 test → data/math_expression_tool/
python recipe/langgraph_agent/example/create_dataset.py

# Custom sizes & output dir
python recipe/langgraph_agent/example/create_dataset.py \
  --train_size 10000 \
  --test_size 1000 \
  --output_dir my_custom_path

# Training
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# SLURM
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```

### Design & Code Changes

**Core runability fixes:**

* `run_qwen2.5_3b.sh`:

  * Replace undefined ARNOLD\_\* vars with SLURM detection + fallbacks
  * Fix dataset paths
  * Add HF hub model fallback
  * Apply performance tuning from GSPO recipe
* `chat_model.py`: Harden tool-call parsing for malformed JSON
* `create_dataset.py`: Add CLI args (`--train_size`, `--test_size`,
`--output_dir`) with defaults

**Docs & polish:**

* Update `README.md` with CLI params and SLURM example
* Sort imports to satisfy ruff linting

**Impact:** Example now works out-of-the-box in local and cluster
environments without edits.

### Checklist Before Submitting

* [x] Read the [[Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)
* [x] Pre-commit checks: `pre-commit install && pre-commit run
--all-files --show-diff-on-failure --color=always`
* [x] Documentation updated (`README.md`)
* [x] Manual end-to-end test with convergence results
* [x] CI request to be sent in Slack once PR is opened
WncFht pushed a commit to WncFht/verl that referenced this pull request Oct 10, 2025
…olcengine#3029)

### What does this PR do?

Fixes the LangGraph agent recipe so it runs out-of-the-box across
different environments. The original example had undefined variables and
brittle error handling that caused failures. This PR makes it portable,
robust, and self-contained. No breaking API changes.

### Checklist Before Starting

* [x] Search for similar PRs:
[https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++\&type=pullrequests\&state=open](https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open)
* [x] Format PR title as `[recipe] fix: make LangGraph agent example
runnable out-of-the-box`

  * `{modules}`: recipe
  * `{type}`: fix
  * No breaking API changes

### Test

**✅ End-to-end validation:**

```bash
# 1. Generate dataset (parameterized)
python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100

# 2. Run training (no modifications needed)
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# 3. SLURM submission (headers included)
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```

**Note on `GPUS_PER_NODE` and `NNODES`:**

- `GPUS_PER_NODE`: GPUs per node.  
Detection order: `SLURM_GPUS_ON_NODE` (if set) → `GPUS_PER_NODE` → `2`.
- `NNODES`: number of nodes.  
  Detection order: `SLURM_JOB_NUM_NODES` (if set) → `NNODES` → `1`.
- Total GPUs = `GPUS_PER_NODE × NNODES` (must be ≥ 2).

Local override (no `SLURM_*` set):
```bash
GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```
**Results:**

* Model converged to 100% validation accuracy
(`val-core/lighteval/MATH/reward/mean@4: 1.0`)
* Stable metrics: policy loss, entropy, critic scores all normal
* No crashes or hangs during run
* Robust handling of malformed tool-call JSON (logs warnings)
* Model path fallback works when local model missing
* SLURM detection + fallbacks confirmed

<img width="3066" height="1288" alt="math_expression_tool – Weights &
Biases"
src="https://github.com/user-attachments/assets/f08d5799-f9ce-44a2-8fb2-19c7c401c248"
/>

### API and Usage Example

**No breaking API changes.** Dataset generator now has a CLI interface:

```bash
# Defaults: 5000 train, 500 test → data/math_expression_tool/
python recipe/langgraph_agent/example/create_dataset.py

# Custom sizes & output dir
python recipe/langgraph_agent/example/create_dataset.py \
  --train_size 10000 \
  --test_size 1000 \
  --output_dir my_custom_path

# Training
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# SLURM
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```

### Design & Code Changes

**Core runability fixes:**

* `run_qwen2.5_3b.sh`:

  * Replace undefined ARNOLD\_\* vars with SLURM detection + fallbacks
  * Fix dataset paths
  * Add HF hub model fallback
  * Apply performance tuning from GSPO recipe
* `chat_model.py`: Harden tool-call parsing for malformed JSON
* `create_dataset.py`: Add CLI args (`--train_size`, `--test_size`,
`--output_dir`) with defaults

**Docs & polish:**

* Update `README.md` with CLI params and SLURM example
* Sort imports to satisfy ruff linting

**Impact:** Example now works out-of-the-box in local and cluster
environments without edits.

### Checklist Before Submitting

* [x] Read the [[Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)
* [x] Pre-commit checks: `pre-commit install && pre-commit run
--all-files --show-diff-on-failure --color=always`
* [x] Documentation updated (`README.md`)
* [x] Manual end-to-end test with convergence results
* [x] CI request to be sent in Slack once PR is opened
techkang pushed a commit to techkang/verl that referenced this pull request Oct 31, 2025
…olcengine#3029)

### What does this PR do?

Fixes the LangGraph agent recipe so it runs out-of-the-box across
different environments. The original example had undefined variables and
brittle error handling that caused failures. This PR makes it portable,
robust, and self-contained. No breaking API changes.

### Checklist Before Starting

* [x] Search for similar PRs:
[https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++\&type=pullrequests\&state=open](https://github.com/search?q=repo%3Avolcengine%2Fverl+langgraph++&type=pullrequests&state=open)
* [x] Format PR title as `[recipe] fix: make LangGraph agent example
runnable out-of-the-box`

  * `{modules}`: recipe
  * `{type}`: fix
  * No breaking API changes

### Test

**✅ End-to-end validation:**

```bash
# 1. Generate dataset (parameterized)
python recipe/langgraph_agent/example/create_dataset.py --train_size 1000 --test_size 100

# 2. Run training (no modifications needed)
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# 3. SLURM submission (headers included)
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```

**Note on `GPUS_PER_NODE` and `NNODES`:**

- `GPUS_PER_NODE`: GPUs per node.  
Detection order: `SLURM_GPUS_ON_NODE` (if set) → `GPUS_PER_NODE` → `2`.
- `NNODES`: number of nodes.  
  Detection order: `SLURM_JOB_NUM_NODES` (if set) → `NNODES` → `1`.
- Total GPUs = `GPUS_PER_NODE × NNODES` (must be ≥ 2).

Local override (no `SLURM_*` set):
```bash
GPUS_PER_NODE=4 NNODES=2 bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```
**Results:**

* Model converged to 100% validation accuracy
(`val-core/lighteval/MATH/reward/mean@4: 1.0`)
* Stable metrics: policy loss, entropy, critic scores all normal
* No crashes or hangs during run
* Robust handling of malformed tool-call JSON (logs warnings)
* Model path fallback works when local model missing
* SLURM detection + fallbacks confirmed

<img width="3066" height="1288" alt="math_expression_tool – Weights &
Biases"
src="https://github.com/user-attachments/assets/f08d5799-f9ce-44a2-8fb2-19c7c401c248"
/>

### API and Usage Example

**No breaking API changes.** Dataset generator now has a CLI interface:

```bash
# Defaults: 5000 train, 500 test → data/math_expression_tool/
python recipe/langgraph_agent/example/create_dataset.py

# Custom sizes & output dir
python recipe/langgraph_agent/example/create_dataset.py \
  --train_size 10000 \
  --test_size 1000 \
  --output_dir my_custom_path

# Training
bash recipe/langgraph_agent/example/run_qwen2.5_3b.sh

# SLURM
sbatch recipe/langgraph_agent/example/run_qwen2.5_3b.sh
```

### Design & Code Changes

**Core runability fixes:**

* `run_qwen2.5_3b.sh`:

  * Replace undefined ARNOLD\_\* vars with SLURM detection + fallbacks
  * Fix dataset paths
  * Add HF hub model fallback
  * Apply performance tuning from GSPO recipe
* `chat_model.py`: Harden tool-call parsing for malformed JSON
* `create_dataset.py`: Add CLI args (`--train_size`, `--test_size`,
`--output_dir`) with defaults

**Docs & polish:**

* Update `README.md` with CLI params and SLURM example
* Sort imports to satisfy ruff linting

**Impact:** Example now works out-of-the-box in local and cluster
environments without edits.

### Checklist Before Submitting

* [x] Read the [[Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md)
* [x] Pre-commit checks: `pre-commit install && pre-commit run
--all-files --show-diff-on-failure --color=always`
* [x] Documentation updated (`README.md`)
* [x] Manual end-to-end test with convergence results
* [x] CI request to be sent in Slack once PR is opened
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants