Commit 162bb65
Merging ROCM/vllm main (#3)
* Fixed single GPU issue without setting up mp. Added toggles for server request batching parameters (vllm-project#114)
* Fixed single GPU issue without setting up mp. Added toggles for server request batching parameters
* Adding HTTP headers
* Add distributed executor backend to benchmark scripts (vllm-project#118)
* Add weight padding for moe (vllm-project#119)
* add weight padding for moe
* enable padding by default
* fix linter
* fix linter
* fix linter
* using envs.py
* fix linter
* [BugFix] Fix navi build after many custom for MI kernels added (vllm-project#116)
* fix navi build
* Created dummy kernels of unsupported on Navi to avoid function not found crashes at runtime
* replacing ifdefs on host code with those on kernels
* refactoring code to avoid unsupported call on Navi
* syntactic change
* import statements fix
* moving env variables to envs.py
* style fixes
* cosmetic changes for isort
* remved extra include
* moving use_skinny to be member
---------
Co-authored-by: lcskrishna <[email protected]>
Co-authored-by: maleksan85 <[email protected]>
Co-authored-by: Gregory Shtrasberg <[email protected]>
* add emtpy_cache() after each padding (vllm-project#120)
* [FIX] Gradlib OOM on Navi and sometimes on MI (vllm-project#124)
* add memory clean up after every shape and parameter to reduce cache invalidation buffers
* small typo
* syntax change
---------
Co-authored-by: maleksan85 <[email protected]>
* save shape when fp8 solution not found (vllm-project#123)
Co-authored-by: Gregory Shtrasberg <[email protected]>
* Fix unit test for moe by adding padding (vllm-project#128)
* fix test_moe
* fix linter
* Llama3.1 (vllm-project#129)
* Add support for a rope extension method (vllm-project#6553)
* [BugFix] Fix RoPE error in Llama 3.1 (vllm-project#6693)
---------
Co-authored-by: Simon Mo <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
* chat/completions endpoint (vllm-project#121)
* Initial implementation of chat/completions endpoint and its streaming variant
* Reusing datatypes from the openai entrypoints
* Response role from arg
* Added models endpoint and model validation from the request
* Optimize custom all reduce (vllm-project#130)
* First version
* Revert error.
While there, add missing finalize.
* Use the correct defaults for ROCm.
Increase sampling area to capture crossover.
* Scope end_sync as well.
* Guard only volatile keyword for ifndef USE_ROCM
* Document crossover
* Add BF16 support to custom PA (vllm-project#133)
* tightened atol for custom PA; enable supported head size, block sizes in testing
* update num_blocks and num_iters in benchmark PA to realistic settings
* move to generic b16 type
* bf16 first port
* enabled all bf16 tests, set atol for bf16
* enable custom PA for bf16 as well as block size 32 and head size 64
* fix cast to zero in custom PA reduce
* py linter fixes
* clang format fixes
* div round up clang-format
---------
Co-authored-by: Charlie Fu <[email protected]>
Co-authored-by: Gregory Shtrasberg <[email protected]>
* Making check for output match in original types. It saves some memory. (vllm-project#135)
Co-authored-by: maleksan85 <[email protected]>
* Make CAR ROCm 6.1 compatible. (vllm-project#137)
* remove scoping
* while there fix a typo
* while there remove unused variable
* Car revert (vllm-project#140)
* Per @iotamudelta suggestion until the deadlocks issue is better understood
Revert "Make CAR ROCm 6.1 compatible. (vllm-project#137)"
This reverts commit 4d2dda6.
* Per @iotamudelta suggestion until the deadlocks issue is better understood
Revert "Optimize custom all reduce (vllm-project#130)"
This reverts commit 636ff01.
---------
Co-authored-by: Gregory Shtrasberg <[email protected]>
Co-authored-by: Matt Wong <[email protected]>
Co-authored-by: Charlie Fu <[email protected]>
Co-authored-by: Aleksandr Malyshev <[email protected]>
Co-authored-by: lcskrishna <[email protected]>
Co-authored-by: maleksan85 <[email protected]>
Co-authored-by: Simon Mo <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: iotamudelta <[email protected]>
Co-authored-by: sanyalington <[email protected]>1 parent aeedfff commit 162bb65
File tree
21 files changed
+1280
-702
lines changed- benchmarks
- kernels
- csrc/custom
- paged_attention
- gradlib/gradlib
- tests/kernels
- vllm
- attention/ops
- engine
- entrypoints
- sync_openai
- model_executor
- layers
- fused_moe
- quantization
- models
21 files changed
+1280
-702
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
41 | | - | |
42 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
43 | 46 | | |
44 | 47 | | |
45 | 48 | | |
| |||
237 | 240 | | |
238 | 241 | | |
239 | 242 | | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
240 | 251 | | |
241 | 252 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
| 82 | + | |
82 | 83 | | |
83 | 84 | | |
84 | 85 | | |
| |||
104 | 105 | | |
105 | 106 | | |
106 | 107 | | |
| 108 | + | |
107 | 109 | | |
108 | 110 | | |
109 | 111 | | |
| |||
229 | 231 | | |
230 | 232 | | |
231 | 233 | | |
232 | | - | |
233 | | - | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
234 | 237 | | |
235 | 238 | | |
236 | 239 | | |
| |||
384 | 387 | | |
385 | 388 | | |
386 | 389 | | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
387 | 398 | | |
388 | 399 | | |
389 | 400 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
| 12 | + | |
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| |||
176 | 176 | | |
177 | 177 | | |
178 | 178 | | |
179 | | - | |
| 179 | + | |
180 | 180 | | |
181 | 181 | | |
182 | 182 | | |
| |||
0 commit comments