[Feature Request] Skip-Softmax (BLASST) Optimization in XQA

TensorRT-LLM has added support for Skip Softmax Attention (paper: https://www.arxiv.org/pdf/2512.12087) via their fmha_v2 and xqa backend. This would help improve performance for the hopper decode kernels if we integrated the support and updated the API on the FlashInfer side. The PR that implemented this is: https://github.com/NVIDIA/TensorRT-LLM/pull/10264