[Draft] Dynamic Load Balancing via Request Interruption and Migration for vLLM Engines #134

taoluo · 2025-08-08T18:58:04Z

Dynamic Load Balancing via Request Interruption for vLLM Engines

Summary

This PR implements a sophisticated dynamic load balancing system for distributed vLLM engines by introducin request interruption. The system automatically redistributes workload across engines to optimize throughput and reduce latency.

Key Features

Request Interruption

Selection Algorithm: Scheduler calculates target_leftover_cnt allowing workers to select which requests to interrupt based on migration overhead
Two-tier Interruption Strategy:
- First interrupts all unscheduled (waiting) requests
- Then selects running/swapped requests with shortest total sequence length to minimize wasted computation

Dynamic Load Balancing

Automatic Imbalance Detection: Monitors request distribution across engines and triggers rebalancing when imbalance exceeds threshold
Conservative Interruption: Calculates interruption count as half of load imbalance to avoid over-correction and oscillation
Freeze Period: Implements 3-second freeze after interruption to allow proper load redistribution before next rebalancing cycle

Implementation Details

Modified generate_scheduler.py to track load metrics and trigger rebalancing
Enhanced vllm_strategy.py with request interruption capabilities
Updated worker communication protocol to support targete interruption cnt

…ed_batch

…ancing - Scheduler now sends target_leftover_cnt instead of specific request IDs - Worker selects requests to interrupt based on overhead: * First interrupts all unscheduled (waiting) requests * Then selects running requests with shortest total sequence length - Add assertions to validate request existence before interruption - Calculate interruption count as half of load imbalance to avoid over-correction - Add 3-second freeze period after interruption to allow load redistribution This minimizes wasted computation by prioritizing requests that haven't started and selecting running requests based on actual progress rather than arbitrary order.

- Implement abort_to_target_requests_cnt for v1 engine - Support both v0/v1 engines in VllmStrategy - Prioritize interruption: waiting → (swapped+running by length) - Fix swapped_count missing in total request calculation - Return interrupted request IDs for proper tracking Enables efficient request migration by minimizing computational loss.

# Conflicts: # roll/distributed/scheduler/generate_scheduler.py # roll/distributed/strategy/vllm_strategy.py # roll/pipeline/rlvr/rlvr_pipeline.py # roll/third_party/vllm/vllm_0_8_4/llm.py

…balance sensitive threshold to be 1

CLAassistant · 2025-08-08T18:58:18Z

All committers have signed the CLA.

taoluo added 16 commits July 15, 2025 22:06

wip: interruption, need to check output difference

7ef98c8

save

00eb0af

refactor: update request handling to use request_metas instead of add…

fd0b011

…ed_batch

improve bookkeeping abort request logic

c72edd9

dynamic rebalance by interrupting the most loaded requests.

e7ec436

refactor: add interruption freeze timeout

1c1a55f

add test data

a745bf1

fix

b8851b1

fix

0621776

fix: request interruption handling

3670181

fix: try to interrupt but no request turned out to be interrupted

9de9442

fix: add timeout to response callback and reward calculation

5a827b0

Merge remote-tracking branch 'ali/main' into pub/dynamic-load-balancing

d55cbf3

# Conflicts: # roll/distributed/scheduler/generate_scheduler.py # roll/distributed/strategy/vllm_strategy.py # roll/pipeline/rlvr/rlvr_pipeline.py # roll/third_party/vllm/vllm_0_8_4/llm.py

fix: demo interruption by always dispatch to dp rank 0 and set the re…

a078c7c

…balance sensitive threshold to be 1

taoluo changed the title ~~Dynamic Load Balancing with Request Interruption for vLLM Engines~~ Dynamic Load Balancing via Request Interruption and Migration for vLLM Engines Aug 8, 2025

add my vllm fork

4a23861

taoluo closed this Aug 8, 2025

taoluo deleted the dynamic-load-balancing branch August 8, 2025 19:25

taoluo restored the dynamic-load-balancing branch August 8, 2025 19:25

taoluo reopened this Aug 8, 2025

taoluo changed the title ~~Dynamic Load Balancing via Request Interruption and Migration for vLLM Engines~~ [Draft] Dynamic Load Balancing via Request Interruption and Migration for vLLM Engines Aug 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Draft] Dynamic Load Balancing via Request Interruption and Migration for vLLM Engines #134

[Draft] Dynamic Load Balancing via Request Interruption and Migration for vLLM Engines #134

Uh oh!

taoluo commented Aug 8, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Aug 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Draft] Dynamic Load Balancing via Request Interruption and Migration for vLLM Engines #134

Are you sure you want to change the base?

[Draft] Dynamic Load Balancing via Request Interruption and Migration for vLLM Engines #134

Uh oh!

Conversation

taoluo commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dynamic Load Balancing via Request Interruption for vLLM Engines

Summary

Key Features

Request Interruption

Dynamic Load Balancing

Implementation Details

Uh oh!

CLAassistant commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

taoluo commented Aug 8, 2025 •

edited

Loading

CLAassistant commented Aug 8, 2025 •

edited

Loading