Commit 4f69d04
committed
hotfix: increase precision of GPTQ/AWQ-Marlin
Sync with upstream change that improves the precision of the
'global_reduce' algorithm from FP16 to FP32. This solves some
reported generation quality issues.
Upstream issue/PR:
vllm-project/vllm#67951 parent 4b49c50 commit 4f69d04
File tree
4 files changed
+492
-387
lines changed- server
- marlin/marlin_kernels
- text_generation_server/layers/marlin
4 files changed
+492
-387
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
3 | 9 | | |
4 | 10 | | |
5 | 11 | | |
| |||
12 | 18 | | |
13 | 19 | | |
14 | 20 | | |
| 21 | + | |
| 22 | + | |
15 | 23 | | |
16 | 24 | | |
17 | 25 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
0 commit comments