Commit 2551817
committed
Fix GGUF pan-and-scan attention and CUDA graph mask preservation
Fixes four critical issues in GGUF multimodal inference:
1. Attention scaling parameter bug (gemma3.py):
- Fix F.scaled_dot_product_attention to use named parameters
- Changed positional args to attn_mask=attn_mask, scale=self.scaling
- Prevents incorrect dropout application (was 6.25% instead of 0%)
2. Custom attention mask persistence (gpu_model_runner.py):
- Store custom_model_kwargs after mask generation
- Merge custom_model_kwargs in _dummy_run
- Prevents loss of attention masks during CUDA graph re-initialization
3. Pan-and-scan attention pattern (gemma3_mm.py):
- Detect pan-and-scan mode via multimodal_config.do_pan_and_scan
- Prevents crop isolation artifacts in sequential processing
4. GGUF unquantized weight loading (weight_utils.py):
- Add proper dtype conversion for BF16/F16/F32 stored as uint8
- Handle byte-to-dtype conversion (BF16: 2 bytes, F16: 2 bytes, F32: 4 bytes)
- Add fallback handling for unexpected dtype/type combinations
- Fixes weight loading for unquantized GGUF multimodal projector weights
Signed-off-by: Luciano Martins <[email protected]>1 parent bb47210 commit 2551817
File tree
4 files changed
+96
-7
lines changed- vllm
- model_executor
- model_loader
- models
- v1/worker
4 files changed
+96
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
919 | 919 | | |
920 | 920 | | |
921 | 921 | | |
| 922 | + | |
922 | 923 | | |
| 924 | + | |
923 | 925 | | |
924 | | - | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
| 937 | + | |
| 938 | + | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
| 960 | + | |
| 961 | + | |
| 962 | + | |
| 963 | + | |
| 964 | + | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
| 972 | + | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
| 980 | + | |
| 981 | + | |
| 982 | + | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
925 | 995 | | |
926 | 996 | | |
927 | 997 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
286 | 286 | | |
287 | 287 | | |
288 | 288 | | |
289 | | - | |
290 | | - | |
| 289 | + | |
| 290 | + | |
291 | 291 | | |
292 | 292 | | |
293 | 293 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
708 | 708 | | |
709 | 709 | | |
710 | 710 | | |
711 | | - | |
712 | | - | |
713 | | - | |
714 | | - | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
715 | 731 | | |
716 | 732 | | |
717 | 733 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2484 | 2484 | | |
2485 | 2485 | | |
2486 | 2486 | | |
| 2487 | + | |
| 2488 | + | |
2487 | 2489 | | |
2488 | 2490 | | |
2489 | 2491 | | |
| |||
3952 | 3954 | | |
3953 | 3955 | | |
3954 | 3956 | | |
| 3957 | + | |
3955 | 3958 | | |
3956 | 3959 | | |
3957 | 3960 | | |
| |||
0 commit comments