Skip to content

Commit 2b49f03

Browse files
committed
[sglang] Feat: Search Tool Invocation in Multi-Turn RL Training (volcengine#1682)
1 parent 8235d5c commit 2b49f03

File tree

21 files changed

+1998
-5
lines changed

21 files changed

+1998
-5
lines changed

.github/workflows/sgl.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,3 +86,7 @@ jobs:
8686
run: |
8787
cd tests/workers/rollout
8888
pytest -s test_sglang_async_rollout_sf_tools.py
89+
- name: Test the latest SGLang Rollout async with search tool
90+
run: |
91+
cd tests/workers/rollout
92+
pytest -s test_sglang_async_rollout_search_tools.py

docs/sglang_multiturn/multiturn.rst

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
Multi-turn Rollout Support
2-
=========================
2+
==========================
33

44
Basic Configuration
5-
~~~~~~~~~~~~~~~~~
5+
~~~~~~~~~~~~~~~~~~~
66

77
To enable multi-turn rollout, make sure to configure the following fields in your rollout configuration:
88

@@ -16,7 +16,7 @@ To enable multi-turn rollout, make sure to configure the following fields in you
1616
These configuration activates the sglang_async engine for multi-turn interaction during rollout.
1717

1818
Custom Tool Configuration
19-
~~~~~~~~~~~~~~~~~~~~~~~
19+
~~~~~~~~~~~~~~~~~~~~~~~~~
2020

2121
For custom environment interaction tools, you can implement your own tools based on ``verl.tools.base_tool.BaseTool``. Then, specify your tool configurations in a YAML file:
2222

@@ -41,7 +41,7 @@ Finally, set the ``tools_config_file`` in your rollout config:
4141
This allows integration of customized tool behaviors during actor rollout steps.
4242

4343
GSM8K Multi-turn Training Performance
44-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
44+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4545

4646
See the training performance of multi-turn rollout on the GSM8K task HERE_.
4747

@@ -50,3 +50,11 @@ See the training performance of multi-turn rollout on the GSM8K task HERE_.
5050
.. _GSM8KTool_example_configuration: https://github.com/volcengine/verl/blob/main/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml
5151

5252
.. _gsm8k_tool.py: https://github.com/volcengine/verl/blob/main/verl/tools/gsm8k_tool.py
53+
54+
Search Tool Integration
55+
~~~~~~~~~~~~~~~~~~~~~~~
56+
57+
.. toctree::
58+
:maxdepth: 1
59+
60+
search_tool_example
Lines changed: 261 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,261 @@
1+
=======================
2+
Search Tool Integration
3+
=======================
4+
Introduction
5+
------------
6+
- We have added a search tool calling function to Multi-Turn RL, enabling the model to initiate retrieval requests during Actor rollout and directly use retrieval results for training. **We support using a local dense retriever as the retrieval tool, as well as integrating with your own local retrieval engine.**
7+
8+
9+
10+
Quick Reproduction
11+
------------------
12+
13+
Create a New Docker Container
14+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15+
16+
.. code:: bash
17+
18+
docker run \
19+
-it \
20+
--shm-size 32g \
21+
--gpus all \
22+
-v {Huggingface-Cache-Path}:/root/.cache \
23+
--ipc=host \
24+
--network=host \
25+
--privileged \
26+
--name sglang_{your-name} \
27+
lmsysorg/sglang:dev \
28+
/bin/zsh
29+
30+
If you need to restart after exiting the container:
31+
32+
.. code:: bash
33+
34+
docker start -i sglang_{your-name}
35+
36+
Update Python and Configure the Virtual Environment using uv
37+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
38+
39+
.. code:: bash
40+
41+
apt update
42+
apt install -y python3.10 python3.10-venv
43+
44+
# Create a virtual environment
45+
python3 -m venv ~/.python/verl-multiturn-rollout
46+
47+
# Activate the virtual environment
48+
source ~/.python/verl-multiturn-rollout/bin/activate
49+
50+
# Install uv
51+
python3 -m pip install uv
52+
53+
Install verl Upstream
54+
~~~~~~~~~~~~~~~~~~~~~
55+
56+
.. code:: bash
57+
58+
cd ~
59+
git clone https://github.com/volcengine/verl.git
60+
cd verl
61+
62+
# Install verl
63+
python3 -m uv pip install .
64+
python3 -m uv pip install -r ./requirements_sglang.txt
65+
66+
# Manually install flash-attn
67+
python3 -m uv pip install wheel
68+
python3 -m uv pip install packaging
69+
python3 -m uv pip install flash-attn --no-build-isolation --no-deps
70+
71+
Set Up a Local Retrieval Engine
72+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
73+
74+
If you are using your own local retrieval service, you can skip this
75+
step. We chose the local dense retriever provided in the search-R1
76+
example; detailed instructions are in the `searchR1
77+
docs <https://raw.githubusercontent.com/PeterGriffinJin/Search-R1/refs/heads/main/docs/retriever.md>`__.
78+
In brief:
79+
80+
- The GPU version offers higher accuracy and speed; each GPU uses about
81+
5–7 GB of memory.
82+
- The CPU version can be used for simple testing but has lower
83+
retrieval precision, which will degrade training performance. See the
84+
`retriever
85+
documentation <https://github.com/PeterGriffinJin/Search-R1/blob/main/docs/retriever.md>`__
86+
in search-R1 for details.
87+
- Recommend using Conda to install faiss-gpu=1.8.0; venv may cause errors.
88+
89+
**Note**: To start both the training process and the local retrieval
90+
service, we launch two separate Python environments. The training uses
91+
uv in the verl-multiturn-rollout environment, while the retriever uses
92+
conda to install ``faiss-gpu``.
93+
94+
.. code:: bash
95+
96+
# Download the Miniconda installer script
97+
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
98+
99+
# Install to $HOME/miniconda3 in batch mode
100+
bash ~/miniconda.sh -b -p $HOME/miniconda3
101+
102+
# Activate conda (only in the current shell)
103+
eval "$($HOME/miniconda3/bin/conda shell.bash hook)"
104+
105+
# (Optional) Add conda to your default shell startup
106+
conda init
107+
108+
# Reload shell config
109+
source ~/.bashrc
110+
111+
# Create and activate the retriever environment with Python 3.10
112+
conda create -n retriever python=3.10 -y
113+
conda activate retriever
114+
115+
# Install PyTorch (with GPU support) and related libraries
116+
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
117+
118+
# Install other Python packages
119+
pip install transformers datasets pyserini huggingface_hub
120+
121+
# Install the GPU version of faiss
122+
conda install faiss-gpu=1.8.0 -c pytorch -c nvidia -y
123+
124+
# Install the API service framework
125+
pip install uvicorn fastapi
126+
127+
Download the Indexing and Corpus
128+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
129+
130+
The local retrieval files are large—prepare sufficient disk space.
131+
Downloading is about 60–70 GB, and uncompressed takes about 132 GB:
132+
133+
.. code:: bash
134+
135+
conda activate retriever
136+
137+
save_path=/the/path/to/save
138+
python examples/sglang_multiturn/search_r1_like/local_dense_retriever/download.py --save_path $save_path
139+
cat $save_path/part_* > $save_path/e5_Flat.index
140+
gzip -d $save_path/wiki-18.jsonl.gz
141+
142+
Start the Local flat e5 Retrieval Server
143+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
144+
145+
1. The first startup will download models and load the index.
146+
2. Apart from the download, startup takes about 1–2 minutes.
147+
3. After startup, each GPU uses about 5–7 GB of memory, leaving the rest
148+
for multi-turn RL training.
149+
150+
.. code:: bash
151+
152+
conda activate retriever
153+
154+
index_file=$save_path/e5_Flat.index
155+
corpus_file=$save_path/wiki-18.jsonl
156+
retriever_name=e5
157+
retriever_path=intfloat/e5-base-v2
158+
159+
python examples/sglang_multiturn/search_r1_like/local_dense_retriever/retrieval_server.py \
160+
--index_path $index_file \
161+
--corpus_path $corpus_file \
162+
--topk 3 \
163+
--retriever_name $retriever_name \
164+
--retriever_model $retriever_path \
165+
--faiss_gpu
166+
167+
Set Up WANDB_API_KEY
168+
~~~~~~~~~~~~~~~~~~~~
169+
170+
.. code:: bash
171+
172+
export WANDB_API_KEY={YOUR_WANDB_API_KEY}
173+
174+
# Define a timestamp function
175+
function now() {
176+
date '+%Y-%m-%d-%H-%M'
177+
}
178+
179+
**Preprocess the Dataset**
180+
~~~~~~~~~~~~~~~~~~~~~~~~~~
181+
182+
**Note:** The following data processing and training commands must be
183+
run in the verl-multiturn-rollout environment.
184+
185+
.. code:: bash
186+
187+
python3 examples/data_preprocess/preprocess_search_r1_dataset.py
188+
189+
Testing on 8 x H20
190+
~~~~~~~~~~~~~~~~~~
191+
192+
.. code:: bash
193+
194+
# Ensure the now() function is defined
195+
# Create a logs directory
196+
mkdir -p logs
197+
198+
# Set GPUs and run with a suitable log path
199+
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
200+
201+
nohup bash examples/sglang_multiturn/search_r1_like/run_qwen2.5-3b_instruct_search_multiturn.sh \
202+
trainer.experiment_name=qwen2.5-3b-it_rm-searchR1-like-sgl-multiturn-$(now) \
203+
> logs/searchR1-like$(now).log 2>&1 &
204+
205+
Custom Search Configuration
206+
---------------------------
207+
208+
To enable multi-turn reasoning, set the following fields in your config:
209+
210+
.. code:: yaml
211+
212+
actor_rollout_ref:
213+
rollout:
214+
name: "sglang_async"
215+
multi_turn:
216+
enable: True
217+
218+
You must specify ``retrieval_service_url`` in ``examples/sglang_multiturn/config/tool_config/search_tool_config.yaml``, and properly configure concurrency. For more details on concurrency, refer to the Sandbox Fusion example:
219+
220+
.. code:: yaml
221+
222+
tools:
223+
- class_name: verl.tools.search_tool.SearchTool
224+
config:
225+
retrieval_service_url: http://127.0.0.1:8000/retrieve
226+
num_workers: 120
227+
rate_limit: 120
228+
timeout: 30
229+
230+
The retriever input/output formats are as follows. If your service
231+
parameters match, only modify ``retrieval_service_url``. You can also
232+
customize in ``search_r1_like_utils.py``.
233+
234+
.. code:: python
235+
236+
Input format:
237+
{
238+
"queries": ["What is Python?", "Tell me about neural networks."],
239+
"topk": 3,
240+
"return_scores": true
241+
}
242+
243+
Output format (when return_scores=True, similarity scores are returned):
244+
{
245+
"result": [
246+
[ # Results for each query
247+
{
248+
"document": doc, "score": score
249+
},
250+
# ... more documents
251+
],
252+
# ... results for other queries
253+
]
254+
}
255+
256+
Notes
257+
-----
258+
259+
1. The total training time is about 27 hours; meanwhile, the validation
260+
dataset is very large (51 k), and each validation takes about 6000 s.
261+
(Therefore, ``val_before_train=False`` by default)

0 commit comments

Comments
 (0)