Commit a115250
Re-integrate HPU after upstream refactors (vllm-project#20)
* Fix setup.py for HPU
* Fix vllm._C import ops -> vllm.hpu import ops
* more of the same thing
* re-add hpex rmsnorm and rope; but rope is crashing
* remove unnecessary comments
* add vllm/hpu files
* add hpu autodetection
* Add HabanaAttention stub
* revert accidental changes
* revert non-habana backend attention changes
* add habana attention/worker/executor, sampling fails now
* Restore unnecessarily changed files
* enable HabanaMemoryProfiler
* Make sampler pass
* restore habana fused rope
* prefill is now working!!!
* fix prefill padding; decode is now working!!!!!
* revert accidental changes
* remove unused stuff in habana_paged_attn.py
* remove diagnostic stuff from llm_engine.py
* use HabanaExecutorAsync in async_llm_engine.py
* add habana copyright headers to habana_*.py files
* fix prefill attention conformance
* minor naming fixes
* remove naive attention from habana_attn (it never worked anyway)
* re-enable profile run
* Add fake HPUGraph support
* add more metrics
* indentation fix
* ~~recipe cache metrics don't work lalalala~~
* i'm done with metrics for now
* fix corner case in which hl-smi is not available but synapse is
* FIXME: temporary setup.py workaround
* WIP: add tensor parallelism stubs
* habana worker cleanup
* tensor parallelism is now working
* remove unused files
* remove unused func
* add hpugraphrunner
* improve hpu layernorm
* Port pipelined PA
* Port context length bucketing
* remove cudagraphrunner from hpu runner
* restore HPUGraphRunner back from FakeHPUGraphRunner
* handle rotary embeddings properly on gaudi3
* oopsie! captured_block_counts was incorrect!
* captured_block_counts.append doesn't do anything
* Restore habana_main KV cache memory layout
* fix memory profiler
* overhaul hpugraph capture
* memory profiling overhaul
* format memory properly in model warmup
* add graph compilation profiler for graph capture phase
* adroll back log lvl on graph capture message
* Remove unnecessary view on residual connection in RMSNorm (vllm-project#25)
---------
Co-authored-by: madamczykhabana <[email protected]>1 parent 01bfb22 commit a115250
File tree
36 files changed
+4045
-113
lines changed- vllm
- attention
- backends
- ops
- engine
- entrypoints/openai
- executor
- hpu
- model_executor
- layers
- quantization
- models
- parallel_utils
- worker
36 files changed
+4045
-113
lines changedThis file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
174 | 174 | | |
175 | 175 | | |
176 | 176 | | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
177 | 188 | | |
178 | | - | |
| 189 | + | |
179 | 190 | | |
180 | 191 | | |
181 | 192 | | |
| |||
190 | 201 | | |
191 | 202 | | |
192 | 203 | | |
193 | | - | |
194 | 204 | | |
195 | 205 | | |
196 | 206 | | |
| |||
265 | 275 | | |
266 | 276 | | |
267 | 277 | | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
268 | 289 | | |
269 | 290 | | |
270 | 291 | | |
| |||
286 | 307 | | |
287 | 308 | | |
288 | 309 | | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
289 | 316 | | |
290 | 317 | | |
291 | 318 | | |
| |||
318 | 345 | | |
319 | 346 | | |
320 | 347 | | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
321 | 351 | | |
322 | 352 | | |
323 | | - | |
| 353 | + | |
324 | 354 | | |
325 | 355 | | |
326 | 356 | | |
| |||
333 | 363 | | |
334 | 364 | | |
335 | 365 | | |
336 | | - | |
| 366 | + | |
337 | 367 | | |
338 | 368 | | |
339 | 369 | | |
| |||
369 | 399 | | |
370 | 400 | | |
371 | 401 | | |
372 | | - | |
| 402 | + | |
373 | 403 | | |
374 | 404 | | |
0 commit comments