[Feature] Fix mixed cache-aware#7129
Conversation
|
Thanks for your contribution! |
|
mouxin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-01
📋 Review 摘要
PR 概述:修复 mixed 模式下 cache-aware 调度策略中的 SelectWorker 和资源释放逻辑
变更范围:golang_router/internal/gateway/completions.go
影响面 Tag:Scheduler APIServer
问题
未发现阻塞性问题。
总体评价
本次变更正确修复了非 PD(mixed)模式下 cache-aware 策略的两个问题:
-
SelectWorker 调用修复:原代码传递空字符串
"",导致 cache-aware 调度策略无法根据请求内容做出正确决策。现在正确提取 message 并传递给SelectWorker。 -
ReleasePrefillTokens 调用补充:在 defer 释放资源时新增
ReleasePrefillTokens调用,确保 token 计数器正确递减,与 PD 模式(Splitwise)的实现保持一致。
代码实现清晰,与已有的 PD 模式逻辑对齐,建议合入。
* [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Fix mixed cache-aware --------- Co-authored-by: mouxin <[email protected]>
修复mixed cache-aware策略中的释放和selectworker逻辑