Releases: rllm-org/rllm
rLLM: v0.2.1.post1
What's Changed
- update docs & add curves by @thwu1 in #343
- Fix import for colorful_print in agent_sdk_engine.py and agent_sdk_trainer.py by @wht0703 in #345
- Unblock sdk installation by overriding dependices by @wht0703 in #348
- [Doc] Update README and fix a few installation related issues by @listar2000 in #347
- fix: keyerror completion_ids by @kxfan2002 in #353
- Fix: Enable GPU acceleration for dense retrieval in search agent by @Gitsamshi in #349
New Contributors
- @wht0703 made their first contribution in #345
- @kxfan2002 made their first contribution in #353
- @Gitsamshi made their first contribution in #349
Full Changelog: v0.2.1...v0.2.1.post1
rLLM: v0.2.1
rLLM v0.2.1: Tinker backend, VLM training, Eval Protocol, and SDK (preview)
We are excited to release rLLM v0.2.1. This new version comes with the following exciting features:
-
rLLM SDK (preview): The rLLM SDK enables you to transform agents written in frameworks such as LangGraph, SmolAgent, or Strands into trainable workflows. Check out this LangGraph RAG example, which builds a RAG agent and trains it with the rLLM SDK.
-
Tinker training backend: In addition to
verl, rLLM now supportsTinkeras a training backend. You can use the same abstractions for building agents and easily switch between different backends for training. -
VLM training: rLLM supports Vision-Language Model training with the
verlbackend. See the Geo3K training example for reference. -
LoRA fine-tuning: rLLM supports LoRA training in both the
verlandTinkerbackends. See the GSM8K LoRA example for how to enable LoRA training with a single config change. -
Eval Protocol Integration We integrate with the Eval Protocol from Fireworks AI. Users can now train on any environments supported by the Eval Protocol. See this example that uses Eval Protocol in rLLM to train a Frozenlake agent.
A big shoutout to @thwu1 @kylemontgomery1 @listar2000 @xzrderek for their outstanding work on these features.
What's Changed
- make rllm-specific configs applied correctly and robustly by @listar2000 in #256
- Ensure disable_thinking defaults to False when config is None by @Tendo33 in #258
- fix: circular import issues in WORKFLOW_CLASS_MAPPING by @listar2000 in #261
- [nightly] initialize the nightly branch by @listar2000 in #263
- Fix environment variable forwarding to ray runtime env by @listar2000 in #265
- [nightly] update recent changes on workflow engines by @listar2000 in #268
- Fix : Prevent KeyError in _pad_dataproto_to_world_size by @mananroongta in #274
- Fix retokenization by @thwu1 in #272
- fix controlling the n_parallel_agents and the concurrent env operations by @LianShuQuan in #271
- Added is_correct & reward flow through tool env by @mananroongta in #277
- Integrate Eval Protocol as RL environment by @1stprinciple in #276
- SWEEnv.from_dict() by @LianShuQuan in #278
- Fix: Resolve PyArrow nested data conversion error in distributed dataset loading by @erranlli in #281
- Per Episode Logging Feature by @qywu in #282
- [feature] Support Tinker as a backend by @thwu1 in #283
- [feat] Tinker Workflow Trainer by @thwu1 in #288
- Fix fireworks dependency by @listar2000 in #296
- Examples: fix utils import by @Flecart in #295
- [Refactor] Update Tinker Backend Example by @thwu1 in #300
- Revert "fix: Gracefully skip overlong prompts during training to prev… by @1stprinciple in #302
- Fixes #303 Optimize old_log_prob computation in PPO trainer by @BabelTower in #304
- Bug/n parallel agents by @kylemontgomery1 in #307
- [nightly] merge recent updates in main back to nightly by @listar2000 in #308
- Adding generic Eval Protocol environments to rLLM by @xzrderek in #306
- [feat] sdk by @thwu1 in #310
- Multimodal by @kylemontgomery1 in #315
- add rllm docs by @xzrderek in #312
- Fix import problem of megatron ray worker group by @listar2000 in #319
- Fix color print display issue by @listar2000 in #317
- [feat] Intergrate OpenTelemetry by @thwu1 in #320
- Remove unnecessary free_cache_engine checks. by @listar2000 in #324
- add vlm docs by @kylemontgomery1 in #326
- [feat] Importance Sampling by @thwu1 in #332
- Fix repetitive application id causing vLLM issue by @listar2000 in #334
- [feat] Add Langgraph Training Example, Fix bugs, Refactor Sdk by @thwu1 in #335
- Add Sdk Doc by @thwu1 in #339
- [feature] simplified deps by @kylemontgomery1 in #327
- Add gsm8k-lora script by @listar2000 in #342
- [v0.2.1] Merge nightly into main for rLLM v0.2.1 by @jeffreysijuntan in #341
New Contributors
- @Tendo33 made their first contribution in #258
- @thwu1 made their first contribution in #272
- @LianShuQuan made their first contribution in #271
- @qywu made their first contribution in #282
- @Flecart made their first contribution in #295
- @BabelTower made their first contribution in #304
- @xzrderek made their first contribution in #306
Full Changelog: v0.2.0...v0.2.1
rLLM: v0.2.0
rLLM v0.2: RL Training over General Agentic Programs (Blog Post)
We are excited to release rLLM v0.2, a major upgrade of our RL training framework. In v0.1, rLLM provided agent and OpenAI Gym-like environment abstractions to support training ReACT-style agents. In v0.2, we additionally introduce AgentWorkflowEngine and AgentWorkflowTrainer—more general abstractions that enable arbitrary agentic programs to be trained. Agent builders and researchers can now define multi-agent systems, complex workflows (e.g., solver-judge, planner executor, MCTS), and agentic programs with custom reward functions, and train them with reinforcement learning without rewriting their production code.
Key Features in v0.2
- Support the official
verl==0.5.0as training backend, no custom verl fork anymore!verl==0.5.0comes with support of the following features which are now supported in rLLM (@kylemontgomery1):- Megatron training support (@jeewoo-lee)
- SGLang as the rollout engine, in addition to vLLM.
- Introduce
AgentWorkflowEngine, which enables passing in arbitrary agentic programs for training. (@kylemontgomery1) - Support more agents and environments
- Terminus and TerminalBench (@JasonWei05)
- Tongyi DeepResearch agent (@yayashuxue)
- AppWorld and AppWorldReactAgent (@sunan135)
- Integration with other agentic framework/SDK
- Strands SDK from AWS
- SmolAgents
What's Changed
- fix <tool_calls_begin> variable by @wj-Mcat in #142
- Fix not registered license from code by @annyan09023 in #144
- fix r2egym import error; update installation README by @jeffreysijuntan in #146
- update deepscaler max_prompt_length to avoid exception during training by @jeffreysijuntan in #148
- fix(syntax): Resolve invalid escape sequence warnings by @tonyz0x0 in #154
- added Tools for SFT by @mananroongta in #160
- update docs by @jeffreysijuntan in #167
- Add dark mode to docs by @philippnormann in #168
- [FIX] Fix tool calling result parsing problem in tranjectory visualizer & MCP tool name fixing by @VincentXWD in #174
- [hotfix][miniwob] Fix gymnasium.error.NameNotFound by @abrohamLee in #172
- Load full DeepCoder dataset, instead of LCB subset by @mananroongta in #178
- [feat][docker] Installation with Docker by @abrohamLee in #177
- Add macOS compatibility: exclude GPU dependencies on darwin by @yayashuxue in #180
- Torch 2.7.0 only compatible with MacOS python=3.11 by @yayashuxue in #184
- Migrate to verl v0.5.0 by @kylemontgomery1 in #193
- Terminal Bench Integration into rLLM (Simplified) by @JasonWei05 in #205
- feat: Integrate Strands SDK with RLLM for scalable tool-enabled agent training by @yayashuxue in #206
- Add VimGolf agent training example by @James4Ever0 in #209
- fix: update search engine source data path by @noiji in #216
- [feature] Adding Megatron support for v0.2 by @jeewoo-lee in #221
- Use RolloutEngine for single_turn_workflow.py by @1stprinciple in #223
- Standalone inference: remove hard verl dependency by @JasonWei05 in #228
- Update pyproject.toml to v0.2.0 by @NIL-zhuang in #229
- proper handling the case that next_observation is empty dict by @erranlli in #233
- [v0.2] Add lazy import to fix circular import and ray init config support by @listar2000 in #236
- v0.2 verl patch by @kylemontgomery1 in #237
- v0.2 masking/parsing fix by @kylemontgomery1 in #238
- v0.2 rollout upgrade by @kylemontgomery1 in #241
- Feat: deepresearch integration by @yayashuxue in #215
- workflow updates by @kylemontgomery1 in #244
- added colab example of solver judge by @jeewoo-lee in #246
- v0.2 misc changes by @kylemontgomery1 in #245
- Add FireworksEngine for disaggregated rollout by @1stprinciple in #243
- AppWorld Integration for rLLM by @sunan135 in #235
- V0.2 by @jeffreysijuntan in #247
- update solver judge workflow by @kylemontgomery1 in #248
- update install instructions, update solver judge notebook by @kylemontgomery1 in #249
New Contributors
- @wj-Mcat made their first contribution in #142
- @annyan09023 made their first contribution in #144
- @tonyz0x0 made their first contribution in #154
- @mananroongta made their first contribution in #160
- @philippnormann made their first contribution in #168
- @VincentXWD made their first contribution in #174
- @abrohamLee made their first contribution in #172
- @yayashuxue made their first contribution in #180
- @kylemontgomery1 made their first contribution in #193
- @JasonWei05 made their first contribution in #205
- @James4Ever0 made their first contribution in #209
- @noiji made their first contribution in #216
- @jeewoo-lee made their first contribution in #221
- @1stprinciple made their first contribution in #223
- @NIL-zhuang made their first contribution in #229
- @erranlli made their first contribution in #233
- @listar2000 made their first contribution in #236
- @sunan135 made their first contribution in #235
Full Changelog: https://github.com/rllm-org/rllm/commits/v0.2.0