Skip to content

[Feature] Fix mixed cache-aware#7129

Merged
Smilencelsy merged 6 commits into
PaddlePaddle:developfrom
mouxinqq:develop
Apr 1, 2026
Merged

[Feature] Fix mixed cache-aware#7129
Smilencelsy merged 6 commits into
PaddlePaddle:developfrom
mouxinqq:develop

Conversation

@mouxinqq
Copy link
Copy Markdown
Contributor

@mouxinqq mouxinqq commented Apr 1, 2026

修复mixed cache-aware策略中的释放和selectworker逻辑

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 1, 2026

Thanks for your contribution!

@paddle-bot paddle-bot Bot added the contributor External developers label Apr 1, 2026
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 1, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ mouxinqq
❌ mouxin


mouxin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@Smilencelsy Smilencelsy merged commit fba8a51 into PaddlePaddle:develop Apr 1, 2026
36 of 38 checks passed
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-01

📋 Review 摘要

PR 概述:修复 mixed 模式下 cache-aware 调度策略中的 SelectWorker 和资源释放逻辑
变更范围:golang_router/internal/gateway/completions.go
影响面 TagScheduler APIServer

问题

未发现阻塞性问题。

总体评价

本次变更正确修复了非 PD(mixed)模式下 cache-aware 策略的两个问题:

  1. SelectWorker 调用修复:原代码传递空字符串 "",导致 cache-aware 调度策略无法根据请求内容做出正确决策。现在正确提取 message 并传递给 SelectWorker

  2. ReleasePrefillTokens 调用补充:在 defer 释放资源时新增 ReleasePrefillTokens 调用,确保 token 计数器正确递减,与 PD 模式(Splitwise)的实现保持一致。

代码实现清晰,与已有的 PD 模式逻辑对齐,建议合入。

xiaoguoguo626807 pushed a commit to xiaoguoguo626807/FastDeploy that referenced this pull request May 7, 2026
* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Config eviction_duration

* [Feature] Fix mixed cache-aware

---------

Co-authored-by: mouxin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants