Skip to content

[feat] KvCompOnDevice: per-KV-head Top-K for Qwen#589

Merged
mag1c-h merged 1 commit intoModelEngine-Group:developfrom
wangwenxin0312:dev_kvcomp_qwen_dev
Dec 26, 2025
Merged

[feat] KvCompOnDevice: per-KV-head Top-K for Qwen#589
mag1c-h merged 1 commit intoModelEngine-Group:developfrom
wangwenxin0312:dev_kvcomp_qwen_dev

Conversation

@wangwenxin0312
Copy link
Copy Markdown
Contributor

Purpose

What this PR does / why we need it?
Enable per-KV-head Top-K selection to support Qwen models with multiple KV heads (GQA) in KvCompOnDevice.

Modifications

  • unified-cache-management/ucm/sparse/kvcomp/ham_dist/paged_ham_dist_mla.cu
    Ensure the Hamming distance kernel correctly outputs scores for each KV head.
  • unified-cache-management/ucm/sparse/kvcomp/hamming_topk.py
    Update the Top-K selection logic to handle multi-KV-head outputs from the Hamming distance operator
    Does this PR introduce any user-facing change?

Test

How was this patch tested?
image

@mag1c-h mag1c-h merged commit ea76b36 into ModelEngine-Group:develop Dec 26, 2025
3 checks passed
@wangwenxin0312 wangwenxin0312 deleted the dev_kvcomp_qwen_dev branch December 28, 2025 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants