Support MLA in Torch Native Attention Backend#3475
Support MLA in Torch Native Attention Backend#3475YangQun1 wants to merge 11 commits intosgl-project:mainfrom
Conversation
e18840e to
69f6898
Compare
ba44fb1 to
fd8b47b
Compare
f150931 to
78f6c02
Compare
|
Hi @ispobock , could you help to review? |
|
Could you fix the pr test and provide some benchmark data vs previous version? |
503ba06 to
77cccf5
Compare
|
Hi @ispobock, the failed test seems to be unrelated to this PR change. Is there any way to retrigger failed test to avoid flaky error? |
|
The performance comparison by using
It seems that the decode perf is impacted, I will investigate it. |
|
Compared to latest main branch, the decode perf has no obvious gap, the prefill perf improved.
|
7ce307f to
7cd5375
Compare
|
Hi @YangQun1, I reviewed this PR but not sure why this change is related to MLA? |
7cd5375 to
7ac8da3
Compare
With this PR, we can run DeepSeek-V2-Lite model with torch native backend while not setting |
Got it. This change is mainly for the forward_normal part, the kv is different from the kv cache. |
2ade106 to
298fbed
Compare
|
Hi @ispobock , ci tests passed, could you help to merge? |
c3534a5 to
adfdd51
Compare
ede4af2 to
a48615d
Compare
a48615d to
d123570
Compare
d123570 to
7b5cad9
Compare
95bb18f to
f29cf10
Compare
fix fix for cpu case fix lint and timeout
fix fix lint
Motivation
Modifications
Checklist