Skip to content

Commit 068cb3e

Browse files
authored
[misc] update baselines & docker image (#256)
1 parent 4072cd8 commit 068cb3e

File tree

5 files changed

+37
-31
lines changed

5 files changed

+37
-31
lines changed

Dockerfile

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,17 @@ RUN pip config set global.index-url "${PIP_INDEX}" && \
4141
# Uninstall nv-pytorch fork
4242
RUN pip uninstall -y torch torchvision torchaudio \
4343
pytorch-quantization pytorch-triton torch-tensorrt \
44-
xgboost transformer_engine flash_attn apex megatron-core grpcio
44+
transformer_engine flash_attn apex megatron-core \
45+
xgboost opencv grpcio
4546

46-
# Install torch-2.6.0+cu124 + vllm-0.8.3
47+
# Fix cv2
48+
RUN rm -rf /usr/local/lib/python3.10/dist-packages/cv2
49+
50+
# Install torch-2.6.0+cu124 + vllm-0.8.4
4751
# torch-2.6.0+cu124: cxx11abi=False
4852
# torch-2.6.0+cu126: cxx11abi=True
4953
# see https://github.com/flashinfer-ai/flashinfer/issues/911
50-
RUN pip install --no-cache-dir "vllm==0.8.3" "torch==2.6.0" "torchvision==0.21.0" "torchaudio==2.6.0" tensordict torchdata \
54+
RUN pip install --no-cache-dir "vllm==0.8.4" "torch==2.6.0" "torchvision==0.21.0" "torchaudio==2.6.0" tensordict torchdata \
5155
"transformers[hf_xet]>=4.51.0" accelerate datasets peft hf-transfer \
5256
"numpy<2.0.0" "pyarrow>=15.0.0" pandas \
5357
ray[default] codetiming hydra-core pylatexenc qwen-vl-utils wandb liger-kernel mathruler \

Dockerfile.legacy

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -42,27 +42,30 @@ RUN pip config set global.index-url "${PIP_INDEX}" && \
4242
# Uninstall nv-pytorch fork
4343
RUN pip uninstall -y torch torchvision torchaudio \
4444
pytorch-quantization pytorch-triton torch-tensorrt \
45-
xgboost transformer_engine flash_attn apex megatron-core
45+
transformer_engine flash_attn apex megatron-core \
46+
xgboost opencv grpcio
47+
48+
# Fix cv2
49+
RUN rm -rf /usr/local/lib/python3.10/dist-packages/cv2
4650

4751
# Install vllm-0.7.4-nightly
4852
RUN pip install --no-cache-dir vllm --pre --extra-index-url "https://wheels.vllm.ai/${VLLM_COMMIT}" && \
4953
git clone -b verl_v1 https://github.com/hiyouga/vllm.git && \
5054
cp -r vllm/vllm/ /usr/local/lib/python3.10/dist-packages/
5155

5256
# Install torch-2.5.1
53-
RUN pip install --no-cache-dir torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 tensordict torchdata \
54-
transformers>=4.49.0 accelerate datasets peft hf-transfer \
55-
ray[default] codetiming hydra-core pandas pyarrow>=15.0.0 pylatexenc qwen-vl-utils wandb liger-kernel mathruler \
57+
RUN pip install --no-cache-dir "torch==2.5.1" "torchvision==0.20.1" "torchaudio==2.5.1" tensordict torchdata \
58+
"transformers>=4.49.0" accelerate datasets peft hf-transfer \
59+
ray[default] codetiming hydra-core pandas "pyarrow>=15.0.0" pylatexenc qwen-vl-utils wandb liger-kernel mathruler \
5660
pytest yapf py-spy pyext pre-commit ruff
5761

5862
# Install flash_attn-2.7.4.post1
5963
RUN wget -nv https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl && \
6064
pip install --no-cache-dir flash_attn-2.7.4.post1+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
6165

62-
# Fix cv2
66+
# Fix packages
6367
RUN pip uninstall -y pynvml nvidia-ml-py && \
64-
pip install --no-cache-dir nvidia-ml-py>=12.560.30 opencv-python-headless==4.8.0.74 fastapi==0.115.6 && \
65-
pip install --no-cache-dir --upgrade optree>=0.13.0
68+
pip install --no-cache-dir --upgrade "nvidia-ml-py>=12.560.30" "fastapi[standard]>=0.115.0" "optree>=0.13.0" "pydantic>=2.9" "grpcio>=1.62.1"
6669

6770
# Reset pip config
6871
RUN pip config unset global.index-url && \

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ We provide a [Dockerfile](./Dockerfile) to easily build environments.
4242
We recommend using the [pre-built docker image](https://hub.docker.com/r/hiyouga/verl) in EasyR1.
4343

4444
```bash
45-
docker pull hiyouga/verl:ngc-th2.6.0-cu126-vllm0.8.3-flashinfer0.2.2-cxx11abi0
45+
docker pull hiyouga/verl:ngc-th2.6.0-cu126-vllm0.8.4-flashinfer0.2.2-cxx11abi0
4646
```
4747

4848
### Hardware Requirements
@@ -138,7 +138,7 @@ We also reproduced the following two baselines of the [R1-V](https://github.com/
138138

139139
## Performance Baselines
140140

141-
See [Baselines.md](assets/baselines.md).
141+
See [baselines.md](assets/baselines.md).
142142

143143
## Awesome Work using EasyR1
144144

assets/baselines.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,31 +8,31 @@ Welcome to contribute new baselines!
88

99
## Algorithm Baselines
1010

11-
### [Qwen2.5-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
11+
### [Qwen2.5-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on [Math12k](https://huggingface.co/datasets/hiyouga/math12k)
1212

13-
| Size | Algorithm | Bits | Dataset | LR | KL | Test Score |
14-
| ---- | ----------- | ---- | ------- | ---- | ---- | ---------- |
15-
| 7B | GRPO | AMP | Math12k | 1e-6 | 1e-2 | 0.73->0.79 |
13+
| Size | Algorithm | Bits | LR | KL | Test Score |
14+
| ---- | ----------- | ---- | ---- | ---- | ---------- |
15+
| 7B | GRPO | AMP | 1e-6 | 1e-2 | 0.73->0.79 |
1616

17-
### [Qwen2.5-VL-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
17+
### [Qwen2.5-VL-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on [Geometry3k](https://huggingface.co/datasets/hiyouga/geometry3k)
1818

19-
| Size | Algorithm | Bits | Dataset | LR | KL | Test Score |
20-
| ---- | ----------- | ---- | ------- | ---- | ---- | ---------- |
21-
| 7B | GRPO | AMP | Geo3k | 1e-6 | 1e-2 | 0.39->0.52 |
22-
| 7B | GRPO | BF16 | Geo3k | 1e-6 | 1e-2 | 0.39->0.52 |
23-
| 7B | GRPO | AMP | Geo3k | 1e-6 | 1e-3 | 0.39->0.52 |
24-
| 7B | RLOO | AMP | Geo3k | 1e-6 | 1e-2 | 0.39->0.53 |
25-
| 3B | GRPO | AMP | Geo3k | 1e-6 | 1e-2 | 0.27->0.44 |
26-
| 32B | GRPO | BF16 | Geo3k | 1e-6 | 1e-2 | 0.46->0.61 |
19+
| Size | Algorithm | Bits | LR | KL | Test Score |
20+
| ---- | ----------- | ---- | ---- | ---- | ---------- |
21+
| 7B | GRPO | AMP | 1e-6 | 1e-2 | 0.39->0.52 |
22+
| 7B | GRPO | BF16 | 1e-6 | 1e-2 | 0.39->0.52 |
23+
| 7B | GRPO | AMP | 1e-6 | 1e-3 | 0.39->0.52 |
24+
| 7B | RLOO | AMP | 1e-6 | 1e-2 | 0.39->0.53 |
25+
| 3B | GRPO | AMP | 1e-6 | 1e-2 | 0.27->0.44 |
26+
| 32B | GRPO | BF16 | 1e-6 | 1e-2 | 0.46->0.61 |
2727

2828
> [!NOTE]
2929
> The hyper-parameters not listed are all the same as the default values.
3030
3131
## Performance Baselines
3232

33-
### [Qwen2.5-VL-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
33+
### [Qwen2.5-VL-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on [Geometry3k](https://huggingface.co/datasets/hiyouga/geometry3k)
3434

35-
| Size | GPU Type | Bits | Batch Size | vLLM util | vLLM TP | Peak Mem | Peak VRAM | Throughput | Sec per step | Actor MFU |
35+
| Size | GPU Type | Bits | Batch Size | vLLM Util | vLLM TP | Peak Mem | Peak VRAM | Throughput | Sec per step | Actor MFU |
3636
| ---- | ------------- | ---- | ---------- | --------- | ------- | -------- | --------- | ---------- | ------------ | --------- |
3737
| 3B | 8 * H100 80GB | AMP | 4 / 16 | 0.6 | 2 | 120GB | 35GB | 1200 | 180s | 6.3% |
3838
| 7B | 8 * H100 80GB | AMP | 4 / 16 | 0.6 | 2 | 140GB | 60GB | 1200 | 180s | 13.6% |
@@ -41,8 +41,8 @@ Welcome to contribute new baselines!
4141
| 7B | 8 * H100 80GB | BF16 | 4 / 16 | 0.6 | 2 | 150GB | 50GB | 1280 | 190s | 13.9% |
4242
| 32B | 8 * H100 80GB | BF16 | 1 / 8 | 0.6 | 8 | 240GB | 68GB | 360 | 860s | 11.2% |
4343

44-
- Batch size: micro_batch_size_per_device_for_update / micro_batch_size_per_device_for_experience
45-
- vLLM util: rollout.gpu_memory_utilization
44+
- Batch Size: micro_batch_size_per_device_for_update / micro_batch_size_per_device_for_experience
45+
- vLLM Util: rollout.gpu_memory_utilization
4646
- vLLM TP: rollout.tensor_parallel_size
4747
- Peak Mem: Peak CPU memory usage
4848
- Peak VRAM: Peak GPU memory usage

scripts/model_merger.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -98,8 +98,7 @@ def upload_model_to_huggingface(local_path: str, remote_path: str):
9898
total_shards = mesh.shape[-1]
9999
mesh_shape = (mesh.shape[-1],)
100100

101-
print(f"Processing model shards with {total_shards} in total.")
102-
101+
print(f"Processing {total_shards} model shards in total.")
103102
model_state_dict_lst = []
104103
model_state_dict_lst.append(state_dict)
105104
model_state_dict_lst.extend([""] * (total_shards - 1))

0 commit comments

Comments
 (0)