【Models】add fleet model fallback by xiaoguoguo626807 · Pull Request #7730 · PaddlePaddle/FastDeploy

xiaoguoguo626807 · 2026-05-07T06:03:03Z

Motivation

新增 PaddleFleet 作为模型推理后端（--model-impl paddlefleet），通过将 PaddleFleet
TransformerLayer 中的 core_attention 替换为 FastDeploy Attention 内核，实现在
PaddleFleet 模型结构上复用 FastDeploy 的 KV Cache 和高性能 Attention 计算。

Modifications

config.py: 新增 paddlefleet 到 ModelImpl 类型定义
engine/args_utils.py: 支持 --model-impl paddlefleet CLI 参数
model_executor/models/paddleformers/base_fleet.py: 新增 PaddleFleetModelBase 基类和 FastDeployAttention 替换逻辑
model_executor/models/paddleformers/__init__.py: 注册 PaddleFleetForCausalLM 模型类
worker/worker_process.py: 同步新增 paddlefleet 选项

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server \
    --model /path/to/model \
    --model-impl paddlefleet

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

…dle#7082)

…Generation (PaddlePaddle#7086) Add clear_grpah_opt_backend method that delegates to the underlying model to clear cuda graph optimization backend. Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>

…le#6731) * [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 * [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 --------- Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

Co-authored-by: “liuruian” <liuruian@baidu.com>

* [BugFix] reset exist tasks signal in clear_data * [Fix] fix stale exist tasks signal after weight update * [Chore] downgrade detected new requests log to DEBUG level * [fix] adjust continue place

* remove ENABLE_V1_DATA_PROCESSOR * fix unit test * fix unit test

…efine PD Disaggregation (PaddlePaddle#7107) * Write the cache of preempted req to storage * up * fix

…Paddle#7098)

…e#7085) * [CI] Optimize test execution with single-GPU parallelism and log collection * remove export CUDA_VISIBLE_DEVICES * fix path error * fix log_* path and debug * [CI] Optimize test execution with single-GPU parallelism and log collection

* [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration --------- Co-authored-by: mouxin <mouxin@baidu.com>

…dle#7126)

* add docs for disaggregated deployment * pre-commit run for style check * update docs

* [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Fix mixed cache-aware --------- Co-authored-by: mouxin <mouxin@baidu.com>

* [XPU] support speculate_pre_process * merge develop * fix codestype * fix mtp, support cu_seqlens_q_output * fix mtp, support cu_seqlens_q_output * fix test --------- Co-authored-by: lizan1999 <lizan03@baidu.com>

… decoding operators (PaddlePaddle#7121) - Fix accept_idx calculation in spec_set_value_by_stop_seqs - Fix condition check from < to <= for token matching - Fix accept_tokens indexing logic - Remove unnecessary -1 in current_step comparison for max_think_len Co-authored-by: guanshihui] <guanshihui@baidu.com>

* support deepgeem for sm103 * add assert * modify code style * add assert * modify sm version condition * remove assert

* fix tool parser

…peculate…" (PaddlePaddle#7133) This reverts commit 9c0c5d6.

…addle#7138)

…ight update and add unit tests (PaddlePaddle#7083) * [test] add a few unit tests * [feat] update key prefix when model weights are updated * [test] try to fix test_worker_process

…dlePaddle#7471) * [BugFix][KVCache] Fix inference slowdown when enabling CPU cache on Blackwell GPU 在 B 卡（Blackwell GPU）上开启 CPU cache（num_cpu_blocks > 0）时，推理性能出现明显降速。根因是 `create_cache_tensor` 的判断逻辑将 `num_cpu_blocks > 0` 作为跳过 GPU cache tensor 创建的条件，导致 B 卡上错误地跳过了 GPU cache tensor 的初始化。 - `fastdeploy/worker/gpu_model_runner.py`：`create_cache_tensor` 判断中移除 `num_cpu_blocks > 0` 条件（两处：`init_cache` 和 `clear_cache`），保证开启 CPU cache 时 GPU cache tensor 仍正常创建 - `fastdeploy/cache_manager/prefix_cache_manager.py`：将 `--create_cache_tensor` 参数从非 splitwise 场景的条件判断中移出，统一归到 `kvcache_storage_backend` 配置路径下，逻辑更清晰 ```bash python -m fastdeploy.entrypoints.openai.api_server \ --num-cpu-blocks <N> \ ... ``` * [BugFix][KVCache] Enlarge prealloc threshold for speculative decoding ## Motivation 投机解码场景下，每个调度步骤一次性消耗 `num_spec_tokens` 个 slot，原有的 `prealloc_dec_block_slot_num_threshold` 阈值偏小，导致块预分配触发不够及时，影响推理性能。 ## Modifications 在 `FDConfig` 初始化阶段，当启用 speculative decoding 时，将 `prealloc_dec_block_slot_num_threshold` 扩大为原值乘以 `num_spec_tokens`，同时确保不超过 enc_dec_block 容量上限。 ## Usage or Command 启用投机解码时，无需额外配置，阈值自动调整： ```bash python -m fastdeploy.entrypoints.openai.api_server \ --speculative-config '{"method": "draft_model", "num_speculative_tokens": 4}' \ ... ``` * [BugFix][KVCache][FDConfig] Fix prealloc threshold and create_cache_tensor for splitwise ## Motivation 两处 bug 修复： 1. speculative decoding 场景下，prealloc_dec_block_slot_num_threshold 的放大系数应为 (num_spec_tokens + 1) 而非 num_spec_tokens，确保预分配触发时机足够提前。 2. kvcache_storage_backend 启用时，--create_cache_tensor 参数只应在非 splitwise 模式下传入，避免 splitwise P 节点错误创建 cache tensor。 ## Modifications - fastdeploy/config.py: 修正 prealloc 放大系数为 (num_spec_tokens + 1)，并添加 logger 打印变动前后的值 - fastdeploy/cache_manager/prefix_cache_manager.py: --create_cache_tensor 仅在非 splitwise 模式下追加 ## Usage or Command ```bash # 启动服务（含 speculative decoding + kvcache storage） python -m fastdeploy.entrypoints.openai.api_server \ --model <model_path> \ --speculative-model <draft_model_path> \ --num-speculative-tokens 3 \ --kvcache-storage-backend <backend> ``` * [BugFix][SpecDecode] Fix create_cache_tensor condition in MTPProposer ## Motivation `create_cache_tensor` 的判断条件中包含 `num_cpu_blocks > 0`，导致在 B卡 CPU cache 场景下，MTP 的 kv cache 创建逻辑出现异常。 ## Modifications 移除 `create_cache_tensor` 判断中的 `num_cpu_blocks > 0` 条件，仅保留 `kvcache_storage_backend` 和 `splitwise_role` 的判断，避免冗余条件干扰。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [BugFix][KVCache] Address review comments: fix negative cap, sync runner fixes, update comments ## Modifications 1. **config.py**: Fix `enc_dec_block_num=0` causing negative upper bound for `prealloc_dec_block_slot_num_threshold`. Use `max(0, ...)` to guard against negative cap. Also fix comment to say `num_spec_tokens + 1` (matching code). 2. **xpu_model_runner.py / metax_model_runner.py**: Sync the same fix from gpu_model_runner.py — remove `num_cpu_blocks > 0` from `create_cache_tensor` condition. CPU cache enablement should not prevent GPU runners from creating GPU cache tensors on XPU/Metax platforms either. 3. **gpu_model_runner.py / mtp.py / xpu_model_runner.py / metax_model_runner.py**: Update stale comments to clarify that CPU cache does NOT prevent GPU cache tensor creation; cache transfer manager handles CPU<->GPU swap on top. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…s_and_idx. (PaddlePaddle#7463)

…dlePaddle#7671) * support routed_scaling_factor_learnable

…rted (PaddlePaddle#7633) * [BugFix] fix preempted token id not returned when a full batch is aborted * [fix] changed fake_sampled_token_ids shape and filled value * [test] add test * [chore] move code place * [test] add more tests and docstring

…lePaddle#7668) * [Router] Support launch golang-router by python command * Update fastdeploy/golang_router/launch.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/golang_router/launch.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(golang_router): fix launch.py bugs and add unit tests Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/57636bb1-779a-417f-934c-07a1462ed41c Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * fix(build.sh, docs): detect host arch for fd-router download; update router docs Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/7a6cb757-5f4d-4c45-9272-e1e3da43ede4 Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * [Router] Move fd-router download from build.sh to setup.py - Remove download_fd_router from build.sh (setup.py handles it via CustomBdistWheel.run) - Add download_fd_router to setup.py with aarch64 support - Always register CustomBdistWheel in cmdclass (not gated by rdma_comm_supported) - Add fd-router binary to .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [Router] Revert doc changes for router.md Will update docs in a separate PR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [Build] Add BUILD_WHEEL=2 mode to skip custom ops compilation When only Python/build scripts are changed, use `bash build.sh 2` to package the wheel directly without recompiling custom ops, significantly reducing build time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [Router] Add deprecation warning to Python Router Print a warning when launching the Python Router, recommending the Golang Router for production use. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [Fix] Suppress noisy warnings and replace pkg_resources - Suppress transformers/paddleformers/setuptools warnings on startup - Replace pkg_resources with importlib.metadata to fix ModuleNotFoundError - Change sm_version print to logging.debug Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [Test] Fix unit tests for eval.py and golang_router_launch - test_eval.py: replace pkg_resources with importlib.metadata mocks - test_golang_router_launch.py: use patch.object on _launch_module to avoid AttributeError from stub module resolution Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: run pre-commit to fix code formatting issues Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/91173bf4-9b99-4cf4-b95b-0758fed8abfa --------- Co-authored-by: jiang-jia-jun <jiangjiajun@baidu.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…_router.launch) (PaddlePaddle#7673) * docs: update router documentation to use Python CLI (python -m fastdeploy.golang_router.launch) Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/955dfc67-4288-4687-bd5a-b7b232fa97e7 Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * docs: fix duplicate link in best_practices/Disaggregated.md Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/955dfc67-4288-4687-bd5a-b7b232fa97e7 Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

…e#7608)

…tion (additional fixes) (PaddlePaddle#7684)

…ort int32) (PaddlePaddle#7648) * fix infer seed * fix infer seed for mtp * fix offset * fix offset

PaddlePaddle#7725)

…/FastDeploy into fleet

CLAassistant · 2026-05-07T06:03:10Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
29 out of 31 committers have signed the CLA.

✅ Deleter-D
✅ EmmonsCurse
✅ xjkmfa
✅ ChowMingSing
✅ cmcamdy
✅ plusNew001
✅ wuyujiji
✅ chang-wenbin
✅ juncaipeng
✅ Jiang-Jia-Jun
✅ iosmers
✅ BingooYang
✅ ApplEOFDiscord
✅ jackyYang6
✅ kevincheng2
✅ zhoutianzi666
✅ zoooo0820
✅ qwes5s5
✅ Dryoung95
✅ zhupengyang
✅ ckl117
✅ liyonghua0910
✅ lizhenyun01
✅ Jiajun-Ji
✅ luukunn
✅ xyxinyang
✅ xiaoguoguo626807
✅ gongshaotian
✅ K11OntheBoat
❌ rain7996
❌ Copilot
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

paddle-bot · 2026-05-07T06:03:11Z

Thanks for your contribution!

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-07 14:30:35

📋 Review 摘要

PR 概述：新增 PaddleFleet 作为模型推理后端（--model-impl paddlefleet），通过替换 TransformerLayer 中的 core_attention 为 FastDeploy Attention 内核，实现 KV Cache 与高性能 Attention 复用。

变更范围：model_executor/models/paddleformers/、config.py、engine/args_utils.py、worker/worker_process.py、requirements.txt

影响面 Tag：[Models] [FDConfig] [Engine]

📝 PR 规范检查

标题使用了中文全角括号 【Models】，不符合 D1 规范要求的 ASCII 方括号格式 [Models]；Accuracy Tests 段仅保留了注释占位符，内容为空，但 Checklist 中已勾选"Provide accuracy results"，存在矛盾。

标题建议（可直接复制）：

[Models] add PaddleFleet model fallback backend

PR 描述建议（可直接复制，必须复刻 checklist §D2 模板的完整结构）：

## Motivation
新增 PaddleFleet 作为模型推理后端（`--model-impl paddlefleet`），通过将 PaddleFleet TransformerLayer 中的 core_attention 替换为 FastDeploy Attention 内核，实现在 PaddleFleet 模型结构上复用 FastDeploy 的 KV Cache 和高性能 Attention 计算。

## Modifications
- `config.py`: 新增 `paddlefleet` 到 `ModelImpl` 类型定义
- `engine/args_utils.py`: 支持 `--model-impl paddlefleet` CLI 参数，更新合法值列表与文档字符串
- `model_executor/models/paddleformers/base_fleet.py`: 新增 `PaddleFleetModelBase` 基类和 `FastDeployAttention` 替换逻辑，实现 `patch_paddlefleet_core_attention` 函数
- `model_executor/models/paddleformers/__init__.py`: 注册 `PaddleFleetForCausalLM` 模型类
- `model_executor/graph_optimization/decorator.py`: 修复 `__call__` 支持位置参数 `*args`
- `model_executor/layers/rotary_embedding.py`: `get_rope_impl` 新增对 `PaddleFleetForCausalLM` 的架构名称解析
- `worker/worker_process.py`: 同步新增 `paddlefleet` 选项

## Usage or Command
```bash
python -m fastdeploy.entrypoints.openai.api_server \
    --model /path/to/model \
    --model-impl paddlefleet
```

## Accuracy Tests
N/A（本 PR 为新增后端框架集成，尚无精度对比数据；后续需补充与原生 FastDeploy 后端的 logits 对齐测试）

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别	文件	概述
🔴 Bug	`base_fleet.py:119`	`assert` 用于运行时校验，`-O` 优化模式下断言被跳过，`forward_meta=None` 时静默传递引发后续难以定位的错误
🟡 建议	`base_fleet.py:372,395`	`print()` 遗留在 `forward` 热路径，每次推理均产生控制台输出，严重影响吞吐性能
❓ 疑问	`requirements.txt`	将 `paddleformers` 锁定到特定 nightly 离线 whl（cu126 only），不适合长期维护
❓ 疑问	`base_fleet.py`	`layer_number` 注释说明"PaddleFleet 从 1 开始"，但直接赋值 `fd_layer_id = layer_number`（注释又说"0-indexed"），存在 off-by-one 风险

🔴 `base_fleet.py:119` — assert 用于运行时校验

assert 在 Python -O 优化模式下会被完全跳过，forward_meta is None 时 None 会静默流入 fd_attention.forward()，产生难以定位的 AttributeError。参照同目录 base.py:285 的模式改为显式 raise ValueError。

🟡 `base_fleet.py:372,395` — print() 遗留热路径

两处 print() 遗留在 @paddle.no_grad() def forward() 热路径中，每次推理都会触发字符串格式化与控制台 I/O（line 395 还会序列化整个 Tensor）。请删除或替换为 logger.debug()：

# line 372: 删除或改为
logger.debug("forward_meta: %s", forward_meta)
# line 395: 删除或改为
logger.debug("position_ids: %s", position_ids)

❓ `requirements.txt` — nightly wheel 锁定问题

paddleformers[paddlefleet] @https://paddle-whl.bj.bcebos.com/nightly/cu126/paddleformers/paddleformers-1.1.0.post20260430-py3-none-any.whl 存在以下问题：

硬编码 URL，链接失效后无法安装
仅覆盖 CUDA 12.6，其他 CUDA 版本用户无法使用
nightly 版本不具备版本保证，不适合生产环境

建议改为正式 release 版本，或在 PR 描述中说明此为临时措施并提出后续计划。

❓ `layer_number` off-by-one 疑问

注释 # Get layer_number (PaddleFleet starts from 1) 表明 layer_number 从 1 起，但随后 fd_layer_id = layer_number 的注释是 # Get FastDeploy layer ID (0-indexed)，两者矛盾。若 FastDeploy KV Cache 按 0-indexed 分配（layer 0 ~ N-1），则 fd_layer_id 应为 layer_number - 1，否则 layer 0 的 KV Cache 永远不会被使用，最后一层可能越界。请确认 Attention(fd_config, layer_id=fd_layer_id) 中 layer_id 的预期取值范围。

总体评价

整体方案清晰，通过 patch_paddlefleet_core_attention 将 FastDeploy Attention 无侵入地嵌入 PaddleFleet 模型结构，思路合理。但有一处 P0 问题（assert 替代 raise ValueError）必须修复，另有两处 print() 调试输出遗留在热路径中影响性能，以及 requirements.txt 的 nightly wheel 锁定和 layer_number 索引的疑问需作者确认后方可合入。

PaddlePaddle-bot · 2026-05-07T06:34:22Z

+        """
+        # Try to get forward_meta from config (PaddleFleet does not pass this parameter when calling)
+        forward_meta = getattr(self.config, "forward_meta", None)
+        assert forward_meta is not None, "forward_meta must be provided"


🔴 Bug assert 被用于运行时校验，Python -O 优化模式下断言会被完全跳过，导致 forward_meta is None 时 None 静默流入 fd_attention.forward()，引发难以定位的 AttributeError。

参照同目录 base.py:285 的做法，改为显式 raise ValueError：

if forward_meta is None: raise ValueError("forward_meta must be provided")

PaddlePaddle-bot · 2026-05-07T06:53:55Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-07 14:53:06

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: bc5cc29
Merge base: 978484c (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

无 Required 检查配置，当前无失败任务；1 个可选任务运行中，CI 尚未全部完成。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
3(0)	3	1	0	1	0	1

2 任务状态汇总

2.1 Required任务 : 0/0 通过

当前未配置必选任务（Branch Protection Rules 未设置 Required Checks），无阻塞合并的任务。

2.2 可选任务 — 1/3 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
⏳	`Trigger Jenkins for PR`	-	Job	-
✅	`Remove skip-ci labels on new commits`	5s	-	-
⏭️	`cherry-pick`（已跳过）	-	-	-

3 失败详情（仅 required）

无 required 失败任务。

liyonghua0910 and others added 30 commits March 31, 2026 10:52

[BugFix] fix speculative gauge metrics in multi api server (PaddlePad…

c21f77f

…dle#7082)

abort requests (PaddlePaddle#6992)

ee42910

fix cuda graph capture failure in CI test (PaddlePaddle#7094)

1c47807

[CI] Remove skip logic for *.txt-only changes (PaddlePaddle#7104)

902a23b

[Qwen3VL] Add clear_grpah_opt_backend method to Qwen3VLForConditional…

57cdd49

…Generation (PaddlePaddle#7086) Add clear_grpah_opt_backend method that delegates to the underlying model to clear cuda graph optimization backend. Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>

cpmmot (PaddlePaddle#7105)

b3b6336

Co-authored-by: “liuruian” <liuruian@baidu.com>

[BugFix] reset exist tasks signal in clear_data (PaddlePaddle#7111)

05fc8a1

* [BugFix] reset exist tasks signal in clear_data * [Fix] fix stale exist tasks signal after weight update * [Chore] downgrade detected new requests log to DEBUG level * [fix] adjust continue place

adjust config info (PaddlePaddle#7054)

50e45ee

[DataProcessor]Remove ENABLE_V1_DATA_PROCESSOR (PaddlePaddle#7052)

6e60145

* remove ENABLE_V1_DATA_PROCESSOR * fix unit test * fix unit test

[PD Disaggregation] Write the cache of preempted req to storage and r…

a264794

…efine PD Disaggregation (PaddlePaddle#7107) * Write the cache of preempted req to storage * up * fix

[Feature] Add logging parameters and error output to terminal (Paddle…

2d32702

…Paddle#7098)

[Feature] Support mtp overlap schedule (PaddlePaddle#7001)

d63a952

[Feature] Config eviction_duration (PaddlePaddle#7125)

e1c5ca9

* [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration --------- Co-authored-by: mouxin <mouxin@baidu.com>

[Others] reuse unit test (PaddlePaddle#7127)

165b35a

[Iluvatar] Fix cuda graph error for tp > 1 in ernie models (PaddlePad…

a2dba97

…dle#7126)

[Docs] Add docs for disaggregated deployment (PaddlePaddle#6700)

f240a75

* add docs for disaggregated deployment * pre-commit run for style check * update docs

[Feature] Fix mixed cache-aware (PaddlePaddle#7129)

2d2c2ec

* [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Config eviction_duration * [Feature] Fix mixed cache-aware --------- Co-authored-by: mouxin <mouxin@baidu.com>

[XPU] Refactor pre process (PaddlePaddle#6993)

1653886

* [XPU] support speculate_pre_process * merge develop * fix codestype * fix mtp, support cu_seqlens_q_output * fix mtp, support cu_seqlens_q_output * fix test --------- Co-authored-by: lizan1999 <lizan03@baidu.com>

[OP] support deepgeem for sm103 (PaddlePaddle#7073)

2f16c7d

* support deepgeem for sm103 * add assert * modify code style * add assert * modify sm version condition * remove assert

[Optimization]Fix tool parser (PaddlePaddle#7079)

bfaf016

* fix tool parser

Revert "[BugFix][Speculative Decoding] Correct index calculation in s…

e03a5f6

…peculate…" (PaddlePaddle#7133) This reverts commit 9c0c5d6.

[CI] Optimize log cleanup and isolation in unittest (PaddlePaddle#7132)

b56281a

[CI] Replace ipc=host with shm-size and sysctl configuration (PaddleP…

11d795c

…addle#7138)

[Other] support video_fps args for video bench (PaddlePaddle#7077)

52d09db

update (PaddlePaddle#7101)

2e7450b

[Optimization] merge_allreduce (PaddlePaddle#7039)

1625e9a

[RL] [KVCache] let cache transfer managers update key prefix after we…

1a6bb2c

…ight update and add unit tests (PaddlePaddle#7083) * [test] add a few unit tests * [feat] update key prefix when model weights are updated * [test] try to fix test_worker_process

kevincheng2 and others added 17 commits April 29, 2026 19:51

Fix key error for updating mtp model weights (PaddlePaddle#7675)

509b60e

[XPU] handle inplace return value for XPU speculate_set_value_by_flag…

1d4cd52

…s_and_idx. (PaddlePaddle#7463)

[Feature] Support routed_scaling_factor_learnable for MoE layers (Pad…

1b7b077

…dlePaddle#7671) * support routed_scaling_factor_learnable

[BugFix] fix incorrect nnode computation (PaddlePaddle#7672)

3f38220

fix seq_lens_decoder bug (PaddlePaddle#7681)

b0194ab

feat: add traceback to error logs and optimize trace log (PaddlePaddl…

413e3b4

…e#7608)

add profile code (PaddlePaddle#7678)

779e43f

[BugFix] Fix get_tasks returns empty list and incorrect nnode computa…

c0be47b

…tion (additional fixes) (PaddlePaddle#7684)

[XPU] padding_sampling_params use int32 MAX_INFER_SEED ("%" only supp…

76aedc9

…ort int32) (PaddlePaddle#7648) * fix infer seed * fix infer seed for mtp * fix offset * fix offset

[BugFix] Fix incorrect create tensor condition when clearing mtp cache (

6bb4a2d

PaddlePaddle#7725)

Merge commit 'refs/pull/7534/head' of https://github.com/PaddlePaddle…

5d0696c

…/FastDeploy into fleet

fix bug

43d9ccd

fix bug

1931e40

xiaoguoguo626807 had a problem deploying to Metax_ci May 7, 2026 06:03 — with GitHub Actions Failure

xiaoguoguo626807 changed the title Fd 【Models】add fleet model fallback May 7, 2026

add fleet requirement

bc5cc29

xiaoguoguo626807 temporarily deployed to Metax_ci May 7, 2026 06:17 — with GitHub Actions Inactive

This comment was marked as outdated.

Sign in to view

xiaoguoguo626807 force-pushed the fd branch from 8c60773 to bc5cc29 Compare May 7, 2026 06:33

xiaoguoguo626807 had a problem deploying to Metax_ci May 7, 2026 06:34 — with GitHub Actions Failure

PaddlePaddle-bot suggested changes May 7, 2026

View reviewed changes

xiaoguoguo626807 closed this May 7, 2026

xiaoguoguo626807 deleted the fd branch May 7, 2026 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Models】add fleet model fallback#7730

【Models】add fleet model fallback#7730
xiaoguoguo626807 wants to merge 5174 commits intoPaddlePaddle:developfrom
xiaoguoguo626807:fd

xiaoguoguo626807 commented May 7, 2026 •

edited

Loading

Uh oh!

CLAassistant commented May 7, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 7, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 7, 2026

Uh oh!

PaddlePaddle-bot commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Conversation

xiaoguoguo626807 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

CLAassistant commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot Bot commented May 7, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

🔴 base_fleet.py:119 — assert 用于运行时校验

🟡 base_fleet.py:372,395 — print() 遗留热路径

❓ requirements.txt — nightly wheel 锁定问题

❓ layer_number off-by-one 疑问

总体评价

Uh oh!

PaddlePaddle-bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot commented May 7, 2026

1 任务总览

2 任务状态汇总

2.1 Required任务 : 0/0 通过

2.2 可选任务 — 1/3 通过

3 失败详情（仅 required）

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

xiaoguoguo626807 commented May 7, 2026 •

edited

Loading

CLAassistant commented May 7, 2026 •

edited

Loading

🔴 `base_fleet.py:119` — assert 用于运行时校验

🟡 `base_fleet.py:372,395` — print() 遗留热路径

❓ `requirements.txt` — nightly wheel 锁定问题

❓ `layer_number` off-by-one 疑问