apply optimization for flash_attn_varlen_func by blzheng · Pull Request #19 · blzheng/sglang

blzheng · 2026-01-07T05:52:38Z

Motivation

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

* port optimization for flash_attn_varlen_func * apply flash_attn_varlen_func

* port layernorm 3d * apply layernorm * support for bias * fix * intf fix * add support for CPU * fix tp=3/6 padding issue in encoder vision * fix tp=3/6 padding issue in qwen3-omni * refactor code * add mrope * change attention_mask shape to use flash attn * add kernel apply_rotary_pos_emb_cpu * replace nn.Linear with ReplicatedLinear * enable torch.compile * construct mask using query.dtype instead of bool on CPU * add fast path for sparse attention * fix double free segfault by wrong setting of BLOCK_M * improve extend kernel performance for long context length * update test_extend.py * update comment * fix topk softmax performance issue * port optimization for image preprocessor in Qwen2VLImageProcessorFast * apply optimization for image preprocessor * update docker file * optimize conv3d used in patch embedding * resolve conflict * apply optimized conv3d * apply optimization for flash_attn_varlen_func (#19) * port optimization for flash_attn_varlen_func * apply flash_attn_varlen_func * remove contiguous before rope (#20) * Revert "resolve conflict" This reverts commit 7622f6d. * fix after rebase * Update pyproject_cpu.toml * Update xeon.Dockerfile * minor fix after rebase * rope: add support for bf16 sincos (sgl-project#102) * format * Update xeon.Dockerfile * odd tp for cpu * Apply linear_gelu_linear and fix numa memory bind (#22) * [CPU] Optimize small oc GEMM for Qwen3-next on CPU (sgl-project#12446) Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com> * port linear_gelu_linear kernel * apply linear_gelu_linear for TP=1 * fix numa memory bind * apply parallel partition patch --------- Co-authored-by: jianan-gu <jianan.gu@intel.com> * Revert "Fix: test_vlm_offline_throughput output throughput (sgl-project#13279)" (sgl-project#101) This reverts commit 7ee3e36. * fix input dtype mismatch issue * apply optimized layernorm --------- Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com> Co-authored-by: ZailiWang <zaili.wang@intel.com> Co-authored-by: mingfeima <mingfei.ma@intel.com> Co-authored-by: jianan-gu <jianan.gu@intel.com>

blzheng added 2 commits January 6, 2026 21:31

port optimization for flash_attn_varlen_func

05880b4

apply flash_attn_varlen_func

58c9c39

blzheng merged commit 12f9582 into beilei/qwen3-omni Jan 7, 2026

blzheng added a commit that referenced this pull request Feb 4, 2026

apply optimization for flash_attn_varlen_func (#19)

c57aad8

* port optimization for flash_attn_varlen_func * apply flash_attn_varlen_func

blzheng added a commit that referenced this pull request Feb 26, 2026

apply optimization for flash_attn_varlen_func (#19)

72c4e75

* port optimization for flash_attn_varlen_func * apply flash_attn_varlen_func

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apply optimization for flash_attn_varlen_func#19

apply optimization for flash_attn_varlen_func#19
blzheng merged 2 commits intobeilei/qwen3-omnifrom
beilei/port_flash_attn_varlen_func

blzheng commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blzheng commented Jan 7, 2026

Motivation

Modifications

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant