What's Changed

update docs & add curves by @thwu1 in #343
Fix import for colorful_print in agent_sdk_engine.py and agent_sdk_trainer.py by @wht0703 in #345
Unblock sdk installation by overriding dependices by @wht0703 in #348
[Doc] Update README and fix a few installation related issues by @listar2000 in #347
fix: keyerror completion_ids by @kxfan2002 in #353
Fix: Enable GPU acceleration for dense retrieval in search agent by @Gitsamshi in #349

New Contributors

@wht0703 made their first contribution in #345
@kxfan2002 made their first contribution in #353
@Gitsamshi made their first contribution in #349

Full Changelog: v0.2.1...v0.2.1.post1

Contributors

Gitsamshi, listar2000, and 3 other contributors

Assets 2

11 Dec 22:58

jeffreysijuntan

v0.2.1

960d573

rLLM: v0.2.1

rLLM v0.2.1: Tinker backend, VLM training, Eval Protocol, and SDK (preview)

We are excited to release rLLM v0.2.1. This new version comes with the following exciting features:

rLLM SDK (preview): The rLLM SDK enables you to transform agents written in frameworks such as LangGraph, SmolAgent, or Strands into trainable workflows. Check out this LangGraph RAG example, which builds a RAG agent and trains it with the rLLM SDK.
Tinker training backend: In addition to verl, rLLM now supports Tinker as a training backend. You can use the same abstractions for building agents and easily switch between different backends for training.
VLM training: rLLM supports Vision-Language Model training with the verl backend. See the Geo3K training example for reference.
LoRA fine-tuning: rLLM supports LoRA training in both the verl and Tinker backends. See the GSM8K LoRA example for how to enable LoRA training with a single config change.
Eval Protocol Integration We integrate with the Eval Protocol from Fireworks AI. Users can now train on any environments supported by the Eval Protocol. See this example that uses Eval Protocol in rLLM to train a Frozenlake agent.

A big shoutout to @thwu1 @kylemontgomery1 @listar2000 @xzrderek for their outstanding work on these features.

What's Changed

make rllm-specific configs applied correctly and robustly by @listar2000 in #256
Ensure disable_thinking defaults to False when config is None by @Tendo33 in #258
fix: circular import issues in WORKFLOW_CLASS_MAPPING by @listar2000 in #261
[nightly] initialize the nightly branch by @listar2000 in #263
Fix environment variable forwarding to ray runtime env by @listar2000 in #265
[nightly] update recent changes on workflow engines by @listar2000 in #268
Fix : Prevent KeyError in _pad_dataproto_to_world_size by @mananroongta in #274
Fix retokenization by @thwu1 in #272
fix controlling the n_parallel_agents and the concurrent env operations by @LianShuQuan in #271
Added is_correct & reward flow through tool env by @mananroongta in #277
Integrate Eval Protocol as RL environment by @1stprinciple in #276
SWEEnv.from_dict() by @LianShuQuan in #278
Fix: Resolve PyArrow nested data conversion error in distributed dataset loading by @erranlli in #281
Per Episode Logging Feature by @qywu in #282
[feature] Support Tinker as a backend by @thwu1 in #283
[feat] Tinker Workflow Trainer by @thwu1 in #288
Fix fireworks dependency by @listar2000 in #296
Examples: fix utils import by @Flecart in #295
[Refactor] Update Tinker Backend Example by @thwu1 in #300
Revert "fix: Gracefully skip overlong prompts during training to prev… by @1stprinciple in #302
Fixes #303 Optimize old_log_prob computation in PPO trainer by @BabelTower in #304
Bug/n parallel agents by @kylemontgomery1 in #307
[nightly] merge recent updates in main back to nightly by @listar2000 in #308
Adding generic Eval Protocol environments to rLLM by @xzrderek in #306
[feat] sdk by @thwu1 in #310
Multimodal by @kylemontgomery1 in #315
add rllm docs by @xzrderek in #312
Fix import problem of megatron ray worker group by @listar2000 in #319
Fix color print display issue by @listar2000 in #317
[feat] Intergrate OpenTelemetry by @thwu1 in #320
Remove unnecessary free_cache_engine checks. by @listar2000 in #324
add vlm docs by @kylemontgomery1 in #326
[feat] Importance Sampling by @thwu1 in #332
Fix repetitive application id causing vLLM issue by @listar2000 in #334
[feat] Add Langgraph Training Example, Fix bugs, Refactor Sdk by @thwu1 in #335
Add Sdk Doc by @thwu1 in #339
[feature] simplified deps by @kylemontgomery1 in #327
Add gsm8k-lora script by @listar2000 in #342
[v0.2.1] Merge nightly into main for rLLM v0.2.1 by @jeffreysijuntan in #341

New Contributors

@Tendo33 made their first contribution in #258
@thwu1 made their first contribution in #272
@LianShuQuan made their first contribution in #271
@qywu made their first contribution in #282
@Flecart made their first contribution in #295
@BabelTower made their first contribution in #304
@xzrderek made their first contribution in #306

Full Changelog: v0.2.0...v0.2.1

Contributors

erranlli, 1stprinciple, and 11 other contributors

Assets 2

16 Oct 21:24

jeffreysijuntan

v0.2.0

52efedc

rLLM: v0.2.0

rLLM v0.2: RL Training over General Agentic Programs (Blog Post)

We are excited to release rLLM v0.2, a major upgrade of our RL training framework. In v0.1, rLLM provided agent and OpenAI Gym-like environment abstractions to support training ReACT-style agents. In v0.2, we additionally introduce AgentWorkflowEngine and AgentWorkflowTrainer—more general abstractions that enable arbitrary agentic programs to be trained. Agent builders and researchers can now define multi-agent systems, complex workflows (e.g., solver-judge, planner executor, MCTS), and agentic programs with custom reward functions, and train them with reinforcement learning without rewriting their production code.

Key Features in v0.2

Support the official verl==0.5.0 as training backend, no custom verl fork anymore! verl==0.5.0 comes with support of the following features which are now supported in rLLM (@kylemontgomery1):
- Megatron training support (@jeewoo-lee)
- SGLang as the rollout engine, in addition to vLLM.
Introduce AgentWorkflowEngine, which enables passing in arbitrary agentic programs for training. (@kylemontgomery1)
Support more agents and environments
- Terminus and TerminalBench (@JasonWei05)
- Tongyi DeepResearch agent (@yayashuxue)
- AppWorld and AppWorldReactAgent (@sunan135)
Integration with other agentic framework/SDK
- Strands SDK from AWS
- SmolAgents

What's Changed

fix <tool_calls_begin> variable by @wj-Mcat in #142
Fix not registered license from code by @annyan09023 in #144
fix r2egym import error; update installation README by @jeffreysijuntan in #146
update deepscaler max_prompt_length to avoid exception during training by @jeffreysijuntan in #148
fix(syntax): Resolve invalid escape sequence warnings by @tonyz0x0 in #154
added Tools for SFT by @mananroongta in #160
update docs by @jeffreysijuntan in #167
Add dark mode to docs by @philippnormann in #168
[FIX] Fix tool calling result parsing problem in tranjectory visualizer & MCP tool name fixing by @VincentXWD in #174
[hotfix][miniwob] Fix gymnasium.error.NameNotFound by @abrohamLee in #172
Load full DeepCoder dataset, instead of LCB subset by @mananroongta in #178
[feat][docker] Installation with Docker by @abrohamLee in #177
Add macOS compatibility: exclude GPU dependencies on darwin by @yayashuxue in #180
Torch 2.7.0 only compatible with MacOS python=3.11 by @yayashuxue in #184
Migrate to verl v0.5.0 by @kylemontgomery1 in #193
Terminal Bench Integration into rLLM (Simplified) by @JasonWei05 in #205
feat: Integrate Strands SDK with RLLM for scalable tool-enabled agent training by @yayashuxue in #206
Add VimGolf agent training example by @James4Ever0 in #209
fix: update search engine source data path by @noiji in #216
[feature] Adding Megatron support for v0.2 by @jeewoo-lee in #221
Use RolloutEngine for single_turn_workflow.py by @1stprinciple in #223
Standalone inference: remove hard verl dependency by @JasonWei05 in #228
Update pyproject.toml to v0.2.0 by @NIL-zhuang in #229
proper handling the case that next_observation is empty dict by @erranlli in #233
[v0.2] Add lazy import to fix circular import and ray init config support by @listar2000 in #236
v0.2 verl patch by @kylemontgomery1 in #237
v0.2 masking/parsing fix by @kylemontgomery1 in #238
v0.2 rollout upgrade by @kylemontgomery1 in #241
Feat: deepresearch integration by @yayashuxue in #215
workflow updates by @kylemontgomery1 in #244
added colab example of solver judge by @jeewoo-lee in #246
v0.2 misc changes by @kylemontgomery1 in #245
Add FireworksEngine for disaggregated rollout by @1stprinciple in #243
AppWorld Integration for rLLM by @sunan135 in #235
V0.2 by @jeffreysijuntan in #247
update solver judge workflow by @kylemontgomery1 in #248
update install instructions, update solver judge notebook by @kylemontgomery1 in #249