[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process #2739

none0663 · 2025-07-24T13:01:57Z

What does this PR do?

Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process.

This capability is particularly beneficial when the model server shares resources with a training workload on the same machine. It allows the reward model service to be temporarily offloaded (to free up GPU memory) during intensive training sessions and reloaded when the service is required again.

…g the process of the request of the rm

gemini-code-assist

Code Review

This pull request introduces a sleep/wakeup mechanism for the vLLM service to manage GPU resources effectively, along with a tqdm progress bar for better user experience during batch processing. The changes are well-intentioned. My review focuses on improving the robustness of the new functionality. I've pointed out a critical issue regarding potential resource leaks that should be addressed with a try...finally block, and a high-severity issue with the use of assert for runtime checks which should be replaced with proper exception handling. I also recommend reverting a change to zip to maintain strict checking of input data, preventing potential silent errors.

gemini-code-assist · 2025-07-24T13:03:56Z

recipe/genrm_remote/reward_function.py

    with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        futures = []
-        for data_source, solution_str, ground_truth, extra_info in zip(
-            data_sources, solution_strs, ground_truths, extra_infos, strict=True
+        for data_source, solution_str, ground_truth, extra_info, index in zip(
+            data_sources, solution_strs, ground_truths, extra_infos, indexes, strict=False
        ):
-            future = executor.submit(compute_score, data_source, solution_str, ground_truth, extra_info)
+            future = executor.submit(compute_score, data_source, solution_str, ground_truth, extra_info, index)
+            time.sleep(0.001 * random.random())
            futures.append(future)

-        results = [future.result() for future in futures]
+        for future in tqdm.tqdm(as_completed(futures), total=len(futures)):
+            results.append(future.result())
+        results = sorted(results, key=lambda x: x[-1], reverse=False)
+        results = [result[0] for result in results]

+    if SERVER_BACKEND == "VLLM" and USE_OFFLOAD:
+        vllm_execute_method("sleep")


The current implementation calls vllm_execute_method("wake_up") before processing and vllm_execute_method("sleep") after. If an exception occurs during batch processing (e.g., inside the with ThreadPoolExecutor block), the sleep call will be skipped. This will leave the model loaded in GPU memory, causing a resource leak.

To ensure the sleep method is always called to free up resources, you should wrap the processing logic in a try...finally block.

Example:

try: with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor: # ... existing logic to submit and process futures ... # ... existing logic to sort and prepare results ... finally: if SERVER_BACKEND == "VLLM" and USE_OFFLOAD: vllm_execute_method("sleep")

gemini-code-assist · 2025-07-24T13:03:56Z

recipe/genrm_remote/reward_function.py

+    assert task in ["sleep", "wake_up"], f"Invalid task: {task}"
+    url_root = BASE_URL
+    response = requests.post(url_root + "/" + task)
+    assert response.status_code == 200


Using assert for runtime checks is not robust because assertions can be disabled with Python's -O flag, which would remove these crucial validations. For runtime error handling and input validation, it's better to use explicit conditional checks and raise appropriate exceptions.

The requests library provides response.raise_for_status() which is an idiomatic way to check for HTTP errors and will provide more context than a generic AssertionError.

Suggested change

assert task in ["sleep", "wake_up"], f"Invalid task: {task}"

url_root = BASE_URL

response = requests.post(url_root + "/" + task)

assert response.status_code == 200

if task not in {"sleep", "wake_up"}:

raise ValueError(f"Invalid task: {task}")

url_root = BASE_URL

response = requests.post(f"{url_root}/{task}")

response.raise_for_status()

gemini-code-assist · 2025-07-24T13:03:56Z

recipe/genrm_remote/reward_function.py

-        for data_source, solution_str, ground_truth, extra_info in zip(
-            data_sources, solution_strs, ground_truths, extra_infos, strict=True
+        for data_source, solution_str, ground_truth, extra_info, index in zip(
+            data_sources, solution_strs, ground_truths, extra_infos, indexes, strict=False


The zip function is called with strict=False. The previous version of this code used strict=True. Using strict=True is safer as it ensures that all input iterables have the same length, raising a ValueError if they don't. Switching to strict=False can hide potential bugs where input lists have mismatched lengths by silently truncating to the shortest list.

It is recommended to revert to strict=True to enforce data integrity.

Suggested change

data_sources, solution_strs, ground_truths, extra_infos, indexes, strict=False

data_sources, solution_strs, ground_truths, extra_infos, indexes, strict=True

wuxibin89 · 2025-07-28T03:11:32Z

For external reward service, I think they should always use separate GPU resource?

none0663 · 2025-07-28T08:23:11Z

For external reward service, I think they should always use separate GPU resource?

Yes, it’s generally best practice to use dedicated GPU resources for the external reward service when resources are plentiful. However, when GPU resources are constrained, especially considering that many GPUs allocated for training might be idle during reward model scoring, we prefer to utilize those idle GPU resources for the reward service.

ETOgaosion · 2025-07-30T08:53:55Z

This PR seems to break the genrm test. @none0663 @wuxibin89

Suprisingly, Itself didn't test this.

For #2794 to solve all CI faults in main, we may revert this first and add some robustness check.

…and add tqdm showing process" (#2813) Reverts #2739 For #2794 to solve all CI faults.

…tqdm showing process (volcengine#2739) ### What does this PR do? Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process. This capability is particularly beneficial when the model server shares resources with a training workload on the same machine. It allows the reward model service to be temporarily offloaded (to free up GPU memory) during intensive training sessions and reloaded when the service is required again.

…and add tqdm showing process" (volcengine#2813) Reverts volcengine#2739 For volcengine#2794 to solve all CI faults.

…tqdm showing process (volcengine#2739) ### What does this PR do? Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process. This capability is particularly beneficial when the model server shares resources with a training workload on the same machine. It allows the reward model service to be temporarily offloaded (to free up GPU memory) during intensive training sessions and reloaded when the service is required again.

…and add tqdm showing process" (volcengine#2813) Reverts volcengine#2739 For volcengine#2794 to solve all CI faults.

…tqdm showing process (volcengine#2739) ### What does this PR do? Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process. This capability is particularly beneficial when the model server shares resources with a training workload on the same machine. It allows the reward model service to be temporarily offloaded (to free up GPU memory) during intensive training sessions and reloaded when the service is required again.

add sleep/wakeup mode for gen rm vllm service and add tqdm for showin…

30343ac

…g the process of the request of the rm

gemini-code-assist bot reviewed Jul 24, 2025

View reviewed changes

fix

1394c2b

wuxibin89 changed the title ~~[recipe] add: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process~~ [recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process Jul 29, 2025

wuxibin89 approved these changes Jul 29, 2025

View reviewed changes

wuxibin89 merged commit 76298ad into volcengine:main Jul 29, 2025
7 of 8 checks passed

ETOgaosion mentioned this pull request Jul 30, 2025

Revert "[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process" #2813

Merged

ETOgaosion added a commit that referenced this pull request Jul 30, 2025

Revert "[recipe] feat: Add sleep/wakeup mode for gen rm vllm service …

d04c69f

…and add tqdm showing process" (#2813) Reverts #2739 For #2794 to solve all CI faults.

Juniper1021 pushed a commit to Juniper1021/verl that referenced this pull request Aug 7, 2025

Revert "[recipe] feat: Add sleep/wakeup mode for gen rm vllm service …

8b6bebc

…and add tqdm showing process" (volcengine#2813) Reverts volcengine#2739 For volcengine#2794 to solve all CI faults.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process #2739

[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process #2739

Uh oh!

none0663 commented Jul 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 24, 2025

Uh oh!

gemini-code-assist bot Jul 24, 2025

Uh oh!

gemini-code-assist bot Jul 24, 2025

Uh oh!

wuxibin89 commented Jul 28, 2025

Uh oh!

none0663 commented Jul 28, 2025

Uh oh!

Uh oh!

ETOgaosion commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	data_sources, solution_strs, ground_truths, extra_infos, indexes, strict=False
	data_sources, solution_strs, ground_truths, extra_infos, indexes, strict=True

[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process #2739

[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process #2739

Uh oh!

Conversation

none0663 commented Jul 24, 2025

What does this PR do?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

wuxibin89 commented Jul 28, 2025

Uh oh!

none0663 commented Jul 28, 2025

Uh oh!

Uh oh!

ETOgaosion commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants