You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue summarizes the main ongoing tasks for the Mirage Persistent Kernel project. We welcome feedback, contributions, and collaborations from the community!
Quantization (FP8/INT8/INT4/FP4): LLM decoding is largely bottleneck by memory access. This task adds support for 8-bit and 4-bit quantization methods in MPK. [MPK] Various Quantization Support #330
Paged and radix attention: the current attention task implements flashattention without paging or prefix reusing, both of which are critical for batched serving.
This issue summarizes the main ongoing tasks for the Mirage Persistent Kernel project. We welcome feedback, contributions, and collaborations from the community!
High-Priority Tasks
Mid-Priority Tasks