|
| 1 | +======================= |
| 2 | +Search Tool Integration |
| 3 | +======================= |
| 4 | +Introduction |
| 5 | +------------ |
| 6 | +- We have added a search tool calling function to Multi-Turn RL, enabling the model to initiate retrieval requests during Actor rollout and directly use retrieval results for training. **We support using a local dense retriever as the retrieval tool, as well as integrating with your own local retrieval engine.** |
| 7 | + |
| 8 | + |
| 9 | + |
| 10 | +Quick Reproduction |
| 11 | +------------------ |
| 12 | + |
| 13 | +Create a New Docker Container |
| 14 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 15 | + |
| 16 | +.. code:: bash |
| 17 | +
|
| 18 | + docker run \ |
| 19 | + -it \ |
| 20 | + --shm-size 32g \ |
| 21 | + --gpus all \ |
| 22 | + -v {Huggingface-Cache-Path}:/root/.cache \ |
| 23 | + --ipc=host \ |
| 24 | + --network=host \ |
| 25 | + --privileged \ |
| 26 | + --name sglang_{your-name} \ |
| 27 | + lmsysorg/sglang:dev \ |
| 28 | + /bin/zsh |
| 29 | +
|
| 30 | +If you need to restart after exiting the container: |
| 31 | + |
| 32 | +.. code:: bash |
| 33 | +
|
| 34 | + docker start -i sglang_{your-name} |
| 35 | +
|
| 36 | +Update Python and Configure the Virtual Environment using uv |
| 37 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 38 | + |
| 39 | +.. code:: bash |
| 40 | +
|
| 41 | + apt update |
| 42 | + apt install -y python3.10 python3.10-venv |
| 43 | +
|
| 44 | + # Create a virtual environment |
| 45 | + python3 -m venv ~/.python/verl-multiturn-rollout |
| 46 | +
|
| 47 | + # Activate the virtual environment |
| 48 | + source ~/.python/verl-multiturn-rollout/bin/activate |
| 49 | +
|
| 50 | + # Install uv |
| 51 | + python3 -m pip install uv |
| 52 | +
|
| 53 | +Install verl Upstream |
| 54 | +~~~~~~~~~~~~~~~~~~~~~ |
| 55 | + |
| 56 | +.. code:: bash |
| 57 | +
|
| 58 | + cd ~ |
| 59 | + git clone https://github.com/volcengine/verl.git |
| 60 | + cd verl |
| 61 | +
|
| 62 | + # Install verl |
| 63 | + python3 -m uv pip install . |
| 64 | + python3 -m uv pip install -r ./requirements_sglang.txt |
| 65 | +
|
| 66 | + # Manually install flash-attn |
| 67 | + python3 -m uv pip install wheel |
| 68 | + python3 -m uv pip install packaging |
| 69 | + python3 -m uv pip install flash-attn --no-build-isolation --no-deps |
| 70 | +
|
| 71 | +Set Up a Local Retrieval Engine |
| 72 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 73 | + |
| 74 | +If you are using your own local retrieval service, you can skip this |
| 75 | +step. We chose the local dense retriever provided in the search-R1 |
| 76 | +example; detailed instructions are in the `searchR1 |
| 77 | +docs <https://raw.githubusercontent.com/PeterGriffinJin/Search-R1/refs/heads/main/docs/retriever.md>`__. |
| 78 | +In brief: |
| 79 | + |
| 80 | +- The GPU version offers higher accuracy and speed; each GPU uses about |
| 81 | + 5–7 GB of memory. |
| 82 | +- The CPU version can be used for simple testing but has lower |
| 83 | + retrieval precision, which will degrade training performance. See the |
| 84 | + `retriever |
| 85 | + documentation <https://github.com/PeterGriffinJin/Search-R1/blob/main/docs/retriever.md>`__ |
| 86 | + in search-R1 for details. |
| 87 | +- Recommend using Conda to install faiss-gpu=1.8.0; venv may cause errors. |
| 88 | + |
| 89 | +**Note**: To start both the training process and the local retrieval |
| 90 | +service, we launch two separate Python environments. The training uses |
| 91 | +uv in the verl-multiturn-rollout environment, while the retriever uses |
| 92 | +conda to install ``faiss-gpu``. |
| 93 | + |
| 94 | +.. code:: bash |
| 95 | +
|
| 96 | + # Download the Miniconda installer script |
| 97 | + wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh |
| 98 | +
|
| 99 | + # Install to $HOME/miniconda3 in batch mode |
| 100 | + bash ~/miniconda.sh -b -p $HOME/miniconda3 |
| 101 | +
|
| 102 | + # Activate conda (only in the current shell) |
| 103 | + eval "$($HOME/miniconda3/bin/conda shell.bash hook)" |
| 104 | +
|
| 105 | + # (Optional) Add conda to your default shell startup |
| 106 | + conda init |
| 107 | +
|
| 108 | + # Reload shell config |
| 109 | + source ~/.bashrc |
| 110 | +
|
| 111 | + # Create and activate the retriever environment with Python 3.10 |
| 112 | + conda create -n retriever python=3.10 -y |
| 113 | + conda activate retriever |
| 114 | +
|
| 115 | + # Install PyTorch (with GPU support) and related libraries |
| 116 | + conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y |
| 117 | +
|
| 118 | + # Install other Python packages |
| 119 | + pip install transformers datasets pyserini huggingface_hub |
| 120 | +
|
| 121 | + # Install the GPU version of faiss |
| 122 | + conda install faiss-gpu=1.8.0 -c pytorch -c nvidia -y |
| 123 | +
|
| 124 | + # Install the API service framework |
| 125 | + pip install uvicorn fastapi |
| 126 | +
|
| 127 | +Download the Indexing and Corpus |
| 128 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 129 | + |
| 130 | +The local retrieval files are large—prepare sufficient disk space. |
| 131 | +Downloading is about 60–70 GB, and uncompressed takes about 132 GB: |
| 132 | + |
| 133 | +.. code:: bash |
| 134 | +
|
| 135 | + conda activate retriever |
| 136 | +
|
| 137 | + save_path=/the/path/to/save |
| 138 | + python examples/sglang_multiturn/search_r1_like/local_dense_retriever/download.py --save_path $save_path |
| 139 | + cat $save_path/part_* > $save_path/e5_Flat.index |
| 140 | + gzip -d $save_path/wiki-18.jsonl.gz |
| 141 | +
|
| 142 | +Start the Local flat e5 Retrieval Server |
| 143 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 144 | + |
| 145 | +1. The first startup will download models and load the index. |
| 146 | +2. Apart from the download, startup takes about 1–2 minutes. |
| 147 | +3. After startup, each GPU uses about 5–7 GB of memory, leaving the rest |
| 148 | + for multi-turn RL training. |
| 149 | + |
| 150 | +.. code:: bash |
| 151 | +
|
| 152 | + conda activate retriever |
| 153 | +
|
| 154 | + index_file=$save_path/e5_Flat.index |
| 155 | + corpus_file=$save_path/wiki-18.jsonl |
| 156 | + retriever_name=e5 |
| 157 | + retriever_path=intfloat/e5-base-v2 |
| 158 | +
|
| 159 | + python examples/sglang_multiturn/search_r1_like/local_dense_retriever/retrieval_server.py \ |
| 160 | + --index_path $index_file \ |
| 161 | + --corpus_path $corpus_file \ |
| 162 | + --topk 3 \ |
| 163 | + --retriever_name $retriever_name \ |
| 164 | + --retriever_model $retriever_path \ |
| 165 | + --faiss_gpu |
| 166 | +
|
| 167 | +Set Up WANDB_API_KEY |
| 168 | +~~~~~~~~~~~~~~~~~~~~ |
| 169 | + |
| 170 | +.. code:: bash |
| 171 | +
|
| 172 | + export WANDB_API_KEY={YOUR_WANDB_API_KEY} |
| 173 | +
|
| 174 | + # Define a timestamp function |
| 175 | + function now() { |
| 176 | + date '+%Y-%m-%d-%H-%M' |
| 177 | + } |
| 178 | +
|
| 179 | +**Preprocess the Dataset** |
| 180 | +~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 181 | + |
| 182 | + **Note:** The following data processing and training commands must be |
| 183 | + run in the verl-multiturn-rollout environment. |
| 184 | + |
| 185 | +.. code:: bash |
| 186 | +
|
| 187 | + python3 examples/data_preprocess/preprocess_search_r1_dataset.py |
| 188 | +
|
| 189 | +Testing on 8 x H20 |
| 190 | +~~~~~~~~~~~~~~~~~~ |
| 191 | + |
| 192 | +.. code:: bash |
| 193 | +
|
| 194 | + # Ensure the now() function is defined |
| 195 | + # Create a logs directory |
| 196 | + mkdir -p logs |
| 197 | +
|
| 198 | + # Set GPUs and run with a suitable log path |
| 199 | + export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 |
| 200 | +
|
| 201 | + nohup bash examples/sglang_multiturn/search_r1_like/run_qwen2.5-3b_instruct_search_multiturn.sh \ |
| 202 | + trainer.experiment_name=qwen2.5-3b-it_rm-searchR1-like-sgl-multiturn-$(now) \ |
| 203 | + > logs/searchR1-like$(now).log 2>&1 & |
| 204 | +
|
| 205 | +Custom Search Configuration |
| 206 | +--------------------------- |
| 207 | + |
| 208 | +To enable multi-turn reasoning, set the following fields in your config: |
| 209 | + |
| 210 | +.. code:: yaml |
| 211 | +
|
| 212 | + actor_rollout_ref: |
| 213 | + rollout: |
| 214 | + name: "sglang_async" |
| 215 | + multi_turn: |
| 216 | + enable: True |
| 217 | +
|
| 218 | +You must specify ``retrieval_service_url`` in ``examples/sglang_multiturn/config/tool_config/search_tool_config.yaml``, and properly configure concurrency. For more details on concurrency, refer to the Sandbox Fusion example: |
| 219 | + |
| 220 | +.. code:: yaml |
| 221 | +
|
| 222 | + tools: |
| 223 | + - class_name: verl.tools.search_tool.SearchTool |
| 224 | + config: |
| 225 | + retrieval_service_url: http://127.0.0.1:8000/retrieve |
| 226 | + num_workers: 120 |
| 227 | + rate_limit: 120 |
| 228 | + timeout: 30 |
| 229 | +
|
| 230 | +The retriever input/output formats are as follows. If your service |
| 231 | +parameters match, only modify ``retrieval_service_url``. You can also |
| 232 | +customize in ``search_r1_like_utils.py``. |
| 233 | + |
| 234 | +.. code:: python |
| 235 | +
|
| 236 | + Input format: |
| 237 | + { |
| 238 | + "queries": ["What is Python?", "Tell me about neural networks."], |
| 239 | + "topk": 3, |
| 240 | + "return_scores": true |
| 241 | + } |
| 242 | +
|
| 243 | + Output format (when return_scores=True, similarity scores are returned): |
| 244 | + { |
| 245 | + "result": [ |
| 246 | + [ # Results for each query |
| 247 | + { |
| 248 | + "document": doc, "score": score |
| 249 | + }, |
| 250 | + # ... more documents |
| 251 | + ], |
| 252 | + # ... results for other queries |
| 253 | + ] |
| 254 | + } |
| 255 | +
|
| 256 | +Notes |
| 257 | +----- |
| 258 | + |
| 259 | +1. The total training time is about 27 hours; meanwhile, the validation |
| 260 | + dataset is very large (51 k), and each validation takes about 6000 s. |
| 261 | + (Therefore, ``val_before_train=False`` by default) |
0 commit comments