Skip to content

【Models】add fleet model fallback#7534

Closed
xiaoguoguo626807 wants to merge 81 commits intoPaddlePaddle:developfrom
xiaoguoguo626807:fleet
Closed

【Models】add fleet model fallback#7534
xiaoguoguo626807 wants to merge 81 commits intoPaddlePaddle:developfrom
xiaoguoguo626807:fleet

Conversation

@xiaoguoguo626807
Copy link
Copy Markdown

@xiaoguoguo626807 xiaoguoguo626807 commented Apr 21, 2026

Motivation

新增 PaddleFleet 作为模型推理后端(--model-impl paddlefleet),通过将 PaddleFleet
TransformerLayer 中的 core_attention 替换为 FastDeploy Attention 内核,实现在
PaddleFleet 模型结构上复用 FastDeploy 的 KV Cache 和高性能 Attention 计算。

Modifications

  • config.py: 新增 paddlefleetModelImpl 类型定义
  • engine/args_utils.py: 支持 --model-impl paddlefleet CLI 参数
  • model_executor/models/paddleformers/base_fleet.py: 新增 PaddleFleetModelBase 基类和 FastDeployAttention 替换逻辑
  • model_executor/models/paddleformers/__init__.py: 注册 PaddleFleetForCausalLM 模型类
  • worker/worker_process.py: 同步新增 paddlefleet 选项

Usage or Command

python -m fastdeploy.entrypoints.openai.api_server \
    --model /path/to/model \
    --model-impl paddlefleet

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 21, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
36 out of 39 committers have signed the CLA.

✅ xiaoguoguo626807
✅ ckl117
✅ yuanlehome
✅ qwes5s5
✅ xiaoxiaohehe001
✅ RuohengMa
✅ Deleter-D
✅ luukunn
✅ kevincheng2
✅ zhoutianzi666
✅ xyxinyang
✅ Jiang-Jia-Jun
✅ wuyujiji
✅ Tryorish
✅ cmcamdy
✅ juncaipeng
✅ huicongyao
✅ liyonghua0910
✅ Sunny-bot1
✅ jackyYang6
✅ ChowMingSing
✅ xjkmfa
✅ K11OntheBoat
✅ iosmers
✅ EmmonsCurse
✅ ApplEOFDiscord
✅ chang-wenbin
✅ BingooYang
✅ zhupengyang
✅ zoooo0820
✅ lizhenyun01
✅ Jiajun-Ji
✅ freeliuzc
✅ plusNew001
✅ Dryoung95
✅ gongshaotian
❌ root
❌ Copilot
❌ rain7996


root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 21, 2026

Thanks for your contribution!

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

Jiang-Jia-Jun and others added 17 commits April 22, 2026 10:58
…d2h copy (PaddlePaddle#7431)

* inplace_copy: encoder_batch_idx/decoder_batch_idx bs == 9 ok

* inplace_copy: encoder_seq_lod/decoder_seq_lod bs == 9 ok

* inplace_copy: all bs == 9 ok

* inplace_copy: all cpu bs == 9 ok

* inplace_copy: len_info_cpu bs == 9 ok

* finished and rm unused code

* prefix_block_tables reuse

* refine

* improve performance

* remove block_table copy to cpu

* fix unit test

* fix

* resolve conflict

* refine code

* fix

* fix

* fix

* fix

* fix

* try fix unit tests

* fix

* tmp save

* fix unit test

* get_infer_param try less return values

* add yinwei fix

---------

Co-authored-by: yinwei <yinwei_hust@163.com>
…essing (PaddlePaddle#7485)

* [NewFeature] support mm runner

* [NewFeature] support mm runner part1

* support mm runner part2

* support mm runner part3

* support mm runner part4
* commit

* commit

* commit

* commit

* commit

* commit

* commit

* commit
* add completions

* add unit test

* add unit test
… request count (PaddlePaddle#7499)

* [Scheduler][BugFix] Fix token_budget calculation to use actual decode request count

## Motivation

当前 `token_budget` 的计算方式存在两个问题:

1. **预扣过多**:budget 按 `max_num_seqs * tokens_per_seq` 预扣,而不是 running 队列中实际处于 decode 阶段的请求数,导致 prefill 可用的 token 数被低估。
2. **循环内重复扣减**:decode 分支固定执行 `token_budget -= 1`,在 spec decode 场景下(`tokens_per_seq > 1`)每个 decode 请求只扣 1,少扣了 `num_speculative_tokens` 个;此外,当 running 队列中 prefill 请求耗尽 budget 后,排在其后的 decode 请求会被循环退出条件 `token_budget > 0` 提前跳过,导致调度漏发。

## Modifications

- `resource_manager_v1.py`
  - 新增 `_is_decoding(request)` 内部方法,封装 `num_computed_tokens >= need_prefill_tokens` 判断,全文统一使用
  - 调度前统计 running 队列中真实的 decode 请求数 `num_running_decode_reqs`,以 `num_running_decode_reqs * tokens_per_seq` 一次性预扣 budget,替代原来的 `max_num_seqs * tokens_per_seq`
  - 去掉 decode 分支内的 `token_budget -= 1`(已在循环前整体预扣)
  - 修改循环退出条件:decode 请求不受 `token_budget <= 0` 限制,仅 prefill 请求在 budget 耗尽时退出

- `config.py`
  - 修复 `max_num_batched_tokens` 的合法性校验,考虑 spec decode 场景下 `tokens_per_seq = num_speculative_tokens + 1`,改为检查 `max_num_batched_tokens >= max_num_seqs * tokens_per_seq`

## Usage or Command

```bash
# 普通启动(非spec decode,行为不变)
python -m fastdeploy.entrypoints.openai.api_server \
  --max-num-batched-tokens 8192 \
  --max-num-seqs 256 \
  ...

# spec decode 场景(tokens_per_seq = num_speculative_tokens + 1)
# 确保 max_num_batched_tokens >= max_num_seqs * tokens_per_seq,否则启动报错
python -m fastdeploy.entrypoints.openai.api_server \
  --max-num-batched-tokens 8192 \
  --max-num-seqs 256 \
  --num-speculative-tokens 4 \
  ...
```

* [FDConfig][BugFix] Fix AttributeError when speculative_config is SimpleNamespace without num_speculative_tokens

## Motivation

当测试中使用 `SimpleNamespace(method=None)` 构造 `speculative_config` 时,
`config.py` 的 `check()` 方法直接访问 `self.speculative_config.num_speculative_tokens`,
导致 `AttributeError: 'types.SimpleNamespace' object has no attribute 'num_speculative_tokens'`。
影响以下测试文件:
- tests/v1/test_resource_manager_v1.py
- tests/eplb/test_eplb_utils.py
- tests/eplb/test_experts_manager.py
- tests/v1/cache_manager/test_prefix_cache.py
- tests/v1/test_schedule_output.py

## Modifications

- `fastdeploy/config.py`: 使用 `getattr(..., "num_speculative_tokens", 0)` 兜底,
  防止 speculative_config 对象缺少该属性时崩溃
- 测试文件:将 `speculative_config=SimpleNamespace(method=None)` 统一改为
  `speculative_config=None`,与无投机解码场景语义一致

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
… files (PaddlePaddle#7432)

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
…addlePaddle#7553)

* Revert "[CI] Temporarily pin paddlepaddle-gpu to 3.5.0.dev20260417 (PaddlePaddle#7486)"

This reverts commit c9783a8.

* [CI] Mark flash attention and related tests as multi_gpu
… fails (PaddlePaddle#7556)

* [BugFix][Metax][KVCache] fix: resolve None callable error when import fails

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [Metax][FIX] fix ci error caused by pr#7428

---------

Co-authored-by: Guanyu Chen (i26275) <i26275@metax-tech.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…Paddle#7247)

* merge develop

* add limit_thinking_content_length_kernel kernel

* add test

* fix code style

* fix_eos_token_id_len_check

* fix plugin

* support model runner

* fix kernel

* add reasoning_phase_token_constraint

* [XPU] Refactor get_padding_offset to single kernel. (PaddlePaddle#7029)

* [XPU] Refactor get_padding_offset to single kernel.

* add unittest.

* fix codestyle.

* remove cum_offsets_now.

* remove max_len.

* fix xpu pre process

* fix code style

* fix get padding offset

* fix reasoning phase token constraint && add status print for test

* add xpu reasoning_phase_token_constraint support in sampler

* fix_get_padding_offset

* fix_get_padding_offset

* fix code style

* update model runner

* fix limit content length kernel

* fix code style

* fix cpu wapper

* fix code style && rm cum_offsets_out

* fix code style

* support not have <<tool_call>>

* add cpu ctx delete

* fix_test

* fix cpu ctx

* fix_test

* fix_test

---------

Co-authored-by: Jiajun Ji <jiajunji_ee@163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
…addle#7568)

* set draft_model_use_cudagraph default to true and fix non-mtp cudaGraph in spec-decoding

* optimize server log

* fix format
qwes5s5 and others added 24 commits April 28, 2026 20:18
* rl support mix_quant

* code check
* support blackwell gemm in ll

* add attr

* opt quant
…dlePaddle#7471)

* [BugFix][KVCache] Fix inference slowdown when enabling CPU cache on Blackwell GPU

在 B 卡(Blackwell GPU)上开启 CPU cache(num_cpu_blocks > 0)时,推理性能出现明显降速。
根因是 `create_cache_tensor` 的判断逻辑将 `num_cpu_blocks > 0` 作为跳过 GPU cache tensor 创建的条件,导致 B 卡上错误地跳过了 GPU cache tensor 的初始化。

- `fastdeploy/worker/gpu_model_runner.py`:`create_cache_tensor` 判断中移除 `num_cpu_blocks > 0` 条件(两处:`init_cache` 和 `clear_cache`),保证开启 CPU cache 时 GPU cache tensor 仍正常创建
- `fastdeploy/cache_manager/prefix_cache_manager.py`:将 `--create_cache_tensor` 参数从非 splitwise 场景的条件判断中移出,统一归到 `kvcache_storage_backend` 配置路径下,逻辑更清晰

```bash
python -m fastdeploy.entrypoints.openai.api_server \
  --num-cpu-blocks <N> \
  ...
```

* [BugFix][KVCache] Enlarge prealloc threshold for speculative decoding

## Motivation

投机解码场景下,每个调度步骤一次性消耗 `num_spec_tokens` 个 slot,原有的
`prealloc_dec_block_slot_num_threshold` 阈值偏小,导致块预分配触发不够及时,
影响推理性能。

## Modifications

在 `FDConfig` 初始化阶段,当启用 speculative decoding 时,将
`prealloc_dec_block_slot_num_threshold` 扩大为原值乘以 `num_spec_tokens`,
同时确保不超过 enc_dec_block 容量上限。

## Usage or Command

启用投机解码时,无需额外配置,阈值自动调整:

```bash
python -m fastdeploy.entrypoints.openai.api_server \
  --speculative-config '{"method": "draft_model", "num_speculative_tokens": 4}' \
  ...
```

* [BugFix][KVCache][FDConfig] Fix prealloc threshold and create_cache_tensor for splitwise

## Motivation

两处 bug 修复:
1. speculative decoding 场景下,prealloc_dec_block_slot_num_threshold 的放大系数应为 (num_spec_tokens + 1) 而非 num_spec_tokens,确保预分配触发时机足够提前。
2. kvcache_storage_backend 启用时,--create_cache_tensor 参数只应在非 splitwise 模式下传入,避免 splitwise P 节点错误创建 cache tensor。

## Modifications

- fastdeploy/config.py: 修正 prealloc 放大系数为 (num_spec_tokens + 1),并添加 logger 打印变动前后的值
- fastdeploy/cache_manager/prefix_cache_manager.py: --create_cache_tensor 仅在非 splitwise 模式下追加

## Usage or Command

```bash
# 启动服务(含 speculative decoding + kvcache storage)
python -m fastdeploy.entrypoints.openai.api_server \
  --model <model_path> \
  --speculative-model <draft_model_path> \
  --num-speculative-tokens 3 \
  --kvcache-storage-backend <backend>
```

* [BugFix][SpecDecode] Fix create_cache_tensor condition in MTPProposer

## Motivation

`create_cache_tensor` 的判断条件中包含 `num_cpu_blocks > 0`,导致在 B卡 CPU cache 场景下,MTP 的 kv cache 创建逻辑出现异常。

## Modifications

移除 `create_cache_tensor` 判断中的 `num_cpu_blocks > 0` 条件,仅保留 `kvcache_storage_backend` 和 `splitwise_role` 的判断,避免冗余条件干扰。

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [BugFix][KVCache] Address review comments: fix negative cap, sync runner fixes, update comments

## Modifications

1. **config.py**: Fix `enc_dec_block_num=0` causing negative upper bound for
   `prealloc_dec_block_slot_num_threshold`. Use `max(0, ...)` to guard against
   negative cap. Also fix comment to say `num_spec_tokens + 1` (matching code).

2. **xpu_model_runner.py / metax_model_runner.py**: Sync the same fix from
   gpu_model_runner.py — remove `num_cpu_blocks > 0` from `create_cache_tensor`
   condition. CPU cache enablement should not prevent GPU runners from creating
   GPU cache tensors on XPU/Metax platforms either.

3. **gpu_model_runner.py / mtp.py / xpu_model_runner.py / metax_model_runner.py**:
   Update stale comments to clarify that CPU cache does NOT prevent GPU cache
   tensor creation; cache transfer manager handles CPU<->GPU swap on top.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…rted (PaddlePaddle#7633)

* [BugFix] fix preempted token id not returned when a full batch is aborted

* [fix] changed fake_sampled_token_ids shape and filled value

* [test] add test

* [chore] move code place

* [test] add more tests and docstring
…lePaddle#7668)

* [Router] Support launch golang-router by python command

* Update fastdeploy/golang_router/launch.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/golang_router/launch.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix(golang_router): fix launch.py bugs and add unit tests

Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/57636bb1-779a-417f-934c-07a1462ed41c

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* fix(build.sh, docs): detect host arch for fd-router download; update router docs

Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/7a6cb757-5f4d-4c45-9272-e1e3da43ede4

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* [Router] Move fd-router download from build.sh to setup.py

- Remove download_fd_router from build.sh (setup.py handles it via CustomBdistWheel.run)
- Add download_fd_router to setup.py with aarch64 support
- Always register CustomBdistWheel in cmdclass (not gated by rdma_comm_supported)
- Add fd-router binary to .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [Router] Revert doc changes for router.md

Will update docs in a separate PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [Build] Add BUILD_WHEEL=2 mode to skip custom ops compilation

When only Python/build scripts are changed, use `bash build.sh 2` to
package the wheel directly without recompiling custom ops, significantly
reducing build time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [Router] Add deprecation warning to Python Router

Print a warning when launching the Python Router, recommending
the Golang Router for production use.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [Fix] Suppress noisy warnings and replace pkg_resources

- Suppress transformers/paddleformers/setuptools warnings on startup
- Replace pkg_resources with importlib.metadata to fix ModuleNotFoundError
- Change sm_version print to logging.debug

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* [Test] Fix unit tests for eval.py and golang_router_launch

- test_eval.py: replace pkg_resources with importlib.metadata mocks
- test_golang_router_launch.py: use patch.object on _launch_module to
  avoid AttributeError from stub module resolution

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: run pre-commit to fix code formatting issues

Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/91173bf4-9b99-4cf4-b95b-0758fed8abfa

---------

Co-authored-by: jiang-jia-jun <jiangjiajun@baidu.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…_router.launch) (PaddlePaddle#7673)

* docs: update router documentation to use Python CLI (python -m fastdeploy.golang_router.launch)

Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/955dfc67-4288-4687-bd5a-b7b232fa97e7

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* docs: fix duplicate link in best_practices/Disaggregated.md

Agent-Logs-Url: https://github.com/PaddlePaddle/FastDeploy/sessions/955dfc67-4288-4687-bd5a-b7b232fa97e7

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
…ort int32) (PaddlePaddle#7648)

* fix infer seed

* fix infer seed for mtp

* fix offset

* fix offset
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-07 14:07:03

📋 Review 摘要

PR 概述:新增 PaddleFleet 作为模型推理后端(--model-impl paddlefleet),通过替换 TransformerLayer 中的 core_attention 复用 FastDeploy 的 KV Cache 和高性能 Attention 计算。
变更范围model_executor/models/paddleformers/config.pyengine/args_utils.pycustom_ops/gpu_ops/
影响面 Tag[Models] [OP] [FDConfig] [Engine]


📝 PR 规范检查

标题使用了中文括号 【Models】,不符合规范要求的 [Models] 英文方括号格式;Accuracy Tests 段仅有模板注释,无实际内容;Add unit tests 未勾选且未说明原因。

标题建议(可直接复制):

  • [Models] add fleet model fallback

PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):

## Motivation
新增 PaddleFleet 作为模型推理后端(`--model-impl paddlefleet`),通过将 PaddleFleet TransformerLayer 中的 `core_attention` 替换为 FastDeploy Attention 内核,实现在 PaddleFleet 模型结构上复用 FastDeploy 的 KV Cache 和高性能 Attention 计算。

## Modifications
- `config.py`: 新增 `paddlefleet``ModelImpl` 类型定义
- `engine/args_utils.py`: 支持 `--model-impl paddlefleet` CLI 参数,并补充校验逻辑
- `model_executor/models/paddleformers/base_fleet.py`: 新增 `PaddleFleetModelBase` 基类、`FastDeployAttention` 层及 `patch_paddlefleet_core_attention` 替换函数
- `model_executor/models/paddleformers/__init__.py`: 注册 `PaddleFleetForCausalLM` 模型类
- `custom_ops/gpu_ops/fused_cast_sigmoid_bias.cu`: 新增 MoE gating 融合算子(cast + sigmoid + bias)
- `custom_ops/gpu_ops/speculate_decoding/build_sampling_params_logprob.cu`: 新增 logprob 采样参数构建算子
- `custom_ops/gpu_ops/speculate_decoding/speculate_logprob_utils.cu`: 将 `SpeculateGetTargetLogits` 重命名并扩展为 `SpeculateGetAcceptTokensAndLogits`(接口新增 2 个参数)

## Usage or Command
```bash
python -m fastdeploy.entrypoints.openai.api_server \
    --model /path/to/model \
    --model-impl paddlefleet
```

## Accuracy Tests
N/A(本 PR 新增 PaddleFleet 推理后端,尚未提供与参考实现的 logits 对齐数据)

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别 文件 概述
🔴 Bug custom_ops/gpu_ops/speculate_decoding/build_sampling_params_logprob.cu:1 新增 CUDA 文件未加入 setup_ops.py 编译列表,运行时将报错
📝 PR 规范 标题使用中文括号;Accuracy Tests 段为空;Add unit tests 未说明原因

总体评价

PaddleFleet backend 的整体架构设计合理,config.py/args_utils.py/模型注册的三处同步均已完成。但 build_sampling_params_logprob.cu 未注册到 setup_ops.py 是阻塞性 bug,需在合入前修复。

@@ -0,0 +1,129 @@
// Copyright (c) 2026 PaddlePaddle Authors. All Rights Reserved.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 新文件已添加但未注册到编译列表

build_sampling_params_logprob.cu 是本 PR 新增的 CUDA kernel 文件,但在 custom_ops/setup_ops.py 的源文件列表中缺少该文件的注册条目。

参考 fused_cast_sigmoid_bias.cu 的处理方式(已在 setup_ops.py 中正确添加),需要在 setup_ops.py 对应的源文件列表中补充该条目,否则该 kernel 不会被编译,Python 侧(sampler.py)对 build_sampling_params_logprob 的调用将在运行时失败。

建议在 setup_ops.py 中添加(参考同目录其他 speculate_decoding kernel 的位置):

"gpu_ops/speculate_decoding/build_sampling_params_logprob.cu",

@PaddlePaddle-bot
Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-07 14:13:31

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

存在 2 个 required 任务失败6 个 required 任务运行中,需等待运行完成后再评估合并条件。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
63(0) 63 34 4 9 3 9

2 任务状态汇总

2.1 Required任务 : 2/18 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Approval 8s 基础设施:PR 未获必要 Reviewer 审批 请相关 Reviewer 审批此 PR Job -
Extracted partial CE model tasks to run in CI. / run_ce_cases - 运行中 - Job -
Run Base Tests / base_tests - 运行中 - Job -
Run FastDeploy LogProb Tests / run_tests_logprob - 运行中 - Job -
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage - 运行中 - Job -
Run Four Cards Tests / run_4_cards_tests - 运行中 - Job -
Run Stable Tests / stable_tests - 运行中 - Job -
其余 2 个必选任务通过 - - - - -

2.2 可选任务 — 32/45 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Check PR Template 26s Job -
其余 32 个可选任务通过 - - -

3 失败详情(仅 required)

Approval — 基础设施(置信度: 高)

Approval

  • 状态: ❌ 失败
  • 错误类型: 基础设施
  • 置信度: 高
  • 根因摘要: PR 未获得必要 Reviewer 审批,审批检查返回 exit code 6
  • 分析器: 通用分析(fallback)

根因详情:
Approval Workflow 是 PR 合并前的必要审批检查流程。exit code 6 表示 PR 尚未获得所有必要 Reviewer 的审批(Required Approvals),此检查与代码变更本身无关,只有在相关 Reviewer 完成 Approve 操作后才会通过。

关键日志:

[FAILURE]: Process completed with exit code 6.

修复建议:

  1. 请相关 Reviewer 对此 PR 进行 Approve 操作
  2. 确认所有必要 Reviewer 已完成审批

修复建议摘要: 请相关 Reviewer 审批此 PR

链接: 查看日志

@xiaoguoguo626807 xiaoguoguo626807 deleted the fleet branch May 7, 2026 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.