[BUG] Support 32 head size for SGLang

Recently, SGLang supports [Bert Encoder Models](https://github.com/sgl-project/sglang/pull/4887#issue-2958146370) with head size of 32, but only for Torch and Triton backend. I'm trying to use the Flashinfer backend but encountered:
```bash
Error: Invalid configuration : NUM_MMA_Q=2 NUM_MMA_D_QK=2 NUM_MMA_D_VO=2 NUM_MMA_KV=4 NUM_WARPS_Q=4 
NUM_WARPS_KV=1 please create an issue (https://github.com/flashinfer-ai/flashinfer/issues) and report the issue to the developers.
```

I'm using JIT mode and the cuda version is 12.4, torch version is 2.5, flashinfer version is 0.2.5 and sglang version is 0.4.6.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Support 32 head size for SGLang #1048

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Support 32 head size for SGLang #1048

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions