Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
4267344
chunked mla
LucasWilkinson Feb 1, 2025
2821aed
add gather cache kernel
LucasWilkinson Feb 5, 2025
dc00371
wip
LucasWilkinson Feb 6, 2025
f50719b
wip running
LucasWilkinson Feb 6, 2025
3d2e770
more cleanup
LucasWilkinson Feb 6, 2025
ea19198
better defaults
LucasWilkinson Feb 6, 2025
d116752
increase MLA gpu_memory_utilization default
LucasWilkinson Feb 6, 2025
c3ad988
wip
LucasWilkinson Feb 7, 2025
ca1b07d
wip fix tensor on wrong device
LucasWilkinson Feb 11, 2025
396f4db
wip
LucasWilkinson Feb 12, 2025
b4a900e
finally :/
LucasWilkinson Feb 12, 2025
04c6042
working!
LucasWilkinson Feb 13, 2025
d0925bb
clean-up
LucasWilkinson Feb 13, 2025
0e173be
delete files
LucasWilkinson Feb 13, 2025
25bd9cb
cleanup
LucasWilkinson Feb 13, 2025
b73fb74
cleanup
LucasWilkinson Feb 13, 2025
829ce2b
relocate merge_attn_states
LucasWilkinson Feb 13, 2025
dee34f7
add comments
LucasWilkinson Feb 13, 2025
b28f99a
comment fixes
LucasWilkinson Feb 13, 2025
54ae713
minor fixes
LucasWilkinson Feb 13, 2025
3db8ab6
remove no-longer necessary changes
LucasWilkinson Feb 13, 2025
04644e3
clean-up
LucasWilkinson Feb 13, 2025
4398787
fix tp
LucasWilkinson Feb 13, 2025
d73f9ff
review comments
LucasWilkinson Feb 14, 2025
f4da0b6
minor fix
LucasWilkinson Feb 14, 2025
50a53aa
fix wrong device, increase workspace, enable cuda-graphs
LucasWilkinson Feb 14, 2025
e0a758e
minor changes
LucasWilkinson Feb 14, 2025
3c800bb
add comment
simon-mo Feb 14, 2025
a79ee4c
fix assert
LucasWilkinson Feb 15, 2025
1c59597
extra workspace allocation during profile run
LucasWilkinson Feb 15, 2025
1137f76
rename
LucasWilkinson Feb 15, 2025
920ecc6
fix illegal memory access
LucasWilkinson Feb 17, 2025
0547a94
Merge remote-tracking branch 'origin/main' into lwilkinson/chunked-mla
LucasWilkinson Feb 18, 2025
b665575
format
LucasWilkinson Feb 18, 2025
3a0ae51
format
LucasWilkinson Feb 18, 2025
28464b5
mypy pass
LucasWilkinson Feb 18, 2025
609267b
Merge branch 'main' into lwilkinson/chunked-mla
tlrmchlsmth Feb 19, 2025
dfb3ada
fix basic model test
LucasWilkinson Feb 19, 2025
9ca182b
attempt to fix AMD build
LucasWilkinson Feb 19, 2025
d325935
attempt 2 fix amd build
LucasWilkinson Feb 19, 2025
6394a8a
Merge remote-tracking branch 'origin/main' into lwilkinson/chunked-mla
LucasWilkinson Feb 20, 2025
f17599e
Merge remote-tracking branch 'origin/main' into lwilkinson/chunked-mla
LucasWilkinson Feb 21, 2025
c5fbdaa
Merge remote-tracking branch 'origin/main' into lwilkinson/chunked-mla
LucasWilkinson Feb 21, 2025
10c4e54
Merge remote-tracking branch 'origin/main' into lwilkinson/chunked-mla
LucasWilkinson Feb 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion vllm/engine/arg_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1169,7 +1169,7 @@ def create_engine_config(self,

# For multimodal models and models with MLA, chunked prefill is
# disabled by default in V0, but enabled by design in V1
if model_config.is_multimodal_model and model_config.use_mla:
if model_config.is_multimodal_model or model_config.use_mla:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok yeah that makes sense for some of the red tests

self.enable_chunked_prefill = bool(envs.VLLM_USE_V1)

elif use_long_context:
Expand Down