This repository contains the official implementation of EvolveR, a framework enabling LLM agents to self-improve through a closed-loop experience lifecycle, where they distill abstract principles from past trajectories and retrieve them to guide future actions.
2025-10-21: Paper is publicly available in arxiv.2025-10-20: Codebase is publicly available.
We recommend using Python 3.10 and Conda for environment management.
# 1. Clone the repository
git clone https://github.com/Edaizi/EvolveR.git
cd EvolveR
# 2. Create and activate conda environment
conda create -n evolver python=3.10 -y
conda activate evolver
# 3. Install dependencies
# install pytorch
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
# verl
pip install -e .
# flash attention 2
pip3 install flash-attn --no-build-isolation
pip install wandbconda create -n vllm python=3.10
pip install vllmconda create -n retriever python=3.10
conda activate retriever
# we recommend installing torch with conda for faiss-gpu
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers datasets pyserini
## install the gpu version faiss to guarantee efficient RL rollout
conda install -c pytorch -c nvidia faiss-gpu=1.8.0
## API function
pip install uvicorn fastapiWe will provide the processed data on Hugging Face Hub. You can download it from the following link:
Place your training and validation data in the following structure. The provided training script uses this path by default.
./data/nq_hotpotqa_train/
├── train.parquet
└── test.parquet
You can modify the DATA_DIR variable in scripts/train_grpo-3b.sh to point to your dataset location.
conda activate vllm
bash scripts/vllm_server.shconda activate retriever
save_path=data/Wiki-corpus-embedd
python scripts/download.py --save_path $save_path
cat $save_path/part_* > $save_path/e5_Flat.index
gzip -d $save_path/wiki-18.jsonl.gzconda activate retriever
bash scripts/retrieval_launch.shbash scripts/train_grpo-3b.shThe script will handle all training steps, including lauching Launching Experience Vector Database (VDB), interacting with the Experience VDB.
For those with limited resources or who wish to bypass the training process, we provide direct access to our open-sourced model weights on the Hugging Face Hub.
| Model | Base Architecture | Params | Hugging Face Hub Link |
|---|---|---|---|
| EvolveR-3B | Qwen2.5 | 3B | Link |
We believe the experience-driven lifecycle of EvolveR is a generalizable paradigm for agent self-improvement. We encourage and welcome the community to extend this framework to other exciting domains, such as code generation, mathematical reasoning, and beyond. We are excited to see what you build!
We would like to thank the developers of the following projects for their open-source contributions.
If you find our paper and code useful, please kindly cite us. A BibTeX entry will be provided upon publication.
@misc{wu2025evolverselfevolvingllmagents,
title={EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle},
author={Rong Wu and Xiaoman Wang and Jianbiao Mei and Pinlong Cai and Daocheng Fu and Cheng Yang and Licheng Wen and Xuemeng Yang and Yufan Shen and Yuxin Wang and Botian Shi},
year={2025},
eprint={2510.16079},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.16079},
}v