You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process (#2739)
### What does this PR do?
Add sleep/wakeup mode for gen rm vllm service and add tqdm showing
process.
This capability is particularly beneficial when the model server shares
resources with a training workload on the same machine. It allows the
reward model service to be temporarily offloaded (to free up GPU memory)
during intensive training sessions and reloaded when the service is
required again.
Note that the wake_up and sleep operations for managing CUDA memory in vLLM are only available when both `VLLM_SERVER_DEV_MODE=1` and `enable_sleep_mode` are set. This capability is particularly beneficial when the model server shares resources with a training workload on the same machine. It allows the reward model service to be temporarily offloaded (to free up GPU memory) during intensive training sessions and reloaded when the service is required again. The relevant vllm code implementation can be found below:
[sleep and wake_up mode](https://github.com/vllm-project/vllm/blob/5a19a6c6705fe83db2e3517a2d2f473586901743/vllm/entrypoints/openai/api_server.py#L994-L1003)
17
+
18
+
When the backend is configured as `SERVER_BACKEND`="VLLM", the `USE_OFFLOAD` flag can be toggled between True and False.(see `reward_function.py`)
0 commit comments