Skip to content

[feat] KvCompOnDevice: per-KV-head Top-K for Qwen#588

Merged
mag1c-h merged 1 commit intoModelEngine-Group:0.2.0-releasefrom
wangwenxin0312:dev_kvcomp_qwen
Dec 26, 2025
Merged

[feat] KvCompOnDevice: per-KV-head Top-K for Qwen#588
mag1c-h merged 1 commit intoModelEngine-Group:0.2.0-releasefrom
wangwenxin0312:dev_kvcomp_qwen

Conversation

@wangwenxin0312
Copy link
Copy Markdown
Contributor

Purpose

What this PR does / why we need it?
Enable per-KV-head Top-K selection to support Qwen models with multiple KV heads (GQA) in KvCompOnDevice.

Modifications

Does this PR introduce any user-facing change?

  • unified-cache-management/ucm/sparse/kvcomp/ham_dist/paged_ham_dist_mla.cu
    Ensure the Hamming distance kernel correctly outputs scores for each KV head.
  • unified-cache-management/ucm/sparse/kvcomp/hamming_topk.py
    Update the Top-K selection logic to handle multi-KV-head outputs from the Hamming distance operator

Test

How was this patch tested?
python examples/offline_inference_kvcomphbm.py
image

@mag1c-h mag1c-h merged commit 7bbed5d into ModelEngine-Group:0.2.0-release Dec 26, 2025
3 checks passed
@wangwenxin0312 wangwenxin0312 deleted the dev_kvcomp_qwen branch December 28, 2025 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants