Skip to content

[Feature Request] Skip-Softmax (BLASST) Optimization in XQA #2483

@jimmyzho

Description

@jimmyzho

TensorRT-LLM has added support for Skip Softmax Attention (paper: https://www.arxiv.org/pdf/2512.12087) via their fmha_v2 and xqa backend. This would help improve performance for the hopper decode kernels if we integrated the support and updated the API on the FlashInfer side. The PR that implemented this is: NVIDIA/TensorRT-LLM#10264

Metadata

Metadata

Assignees

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions