support dynamic adaptation of scheduler plugins configuration

**What would you like to be added**:
starting from the basic building block of scorers. scorers are usually defined in one of two main categories:

Category 1:
Context-aware - which includes scorers such as session-aware, estimated prefix-cache aware, and KV-cache–aware. These scorers use different levels of knowledge to estimate KV-cache locality on serving pods, with precision improving as information becomes more granular. For example, a session-aware strategy may rely on a session-id header that maps to a growing chat, while a KV-cache–aware strategy consumes direct cache events from vLLM pods.

Category 2:
Load-aware - which focuses on metrics such as queue lengths, active request counts, or KV-cache memory utilization, aiming to evenly distribute inference requests, prevent hotspots, and maximize throughput.

These categories often pull in opposite directions: context-aware scorers bias toward sticky routing to maximize cache hits, while load-aware scorers spread requests to minimize queuing delays. Striking the right balance is critical for minimizing latency and maximizing efficiency — but static weights are fragile and often suboptimal.

we should introduce dynamic adaptation of active scorers and/or their weights to optimize performance.

**Why is this needed**:
optimize performance of EPP. we should provide benchmarks to proof the performance is indeed improved when introducing this capability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support dynamic adaptation of scheduler plugins configuration #1992

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

support dynamic adaptation of scheduler plugins configuration #1992

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions