Skip to content

Conversation

@mht-sharma
Copy link
Owner

Merging ROCM/vllm main

gshtras and others added 15 commits August 2, 2024 14:26
…r request batching parameters (#114)

* Fixed single GPU issue without setting up mp. Added toggles for server request batching parameters

* Adding HTTP headers
* add weight padding for moe

* enable padding by default

* fix linter

* fix linter

* fix linter

* using envs.py

* fix linter
* fix navi build

* Created dummy kernels of unsupported on Navi to avoid function not found crashes at runtime

* replacing ifdefs on host code with those on kernels

* refactoring code to avoid unsupported call on Navi

* syntactic change

* import statements fix

* moving env variables to envs.py

* style fixes

* cosmetic changes for isort

* remved extra include

* moving use_skinny to be member

---------

Co-authored-by: lcskrishna <[email protected]>
Co-authored-by: maleksan85 <[email protected]>
Co-authored-by: Gregory Shtrasberg <[email protected]>
* add memory clean up after every shape and parameter to reduce cache invalidation buffers

* small typo

* syntax change

---------

Co-authored-by: maleksan85 <[email protected]>
* Add support for a rope extension method (vllm-project#6553)

* [BugFix] Fix RoPE error in Llama 3.1 (vllm-project#6693)

---------

Co-authored-by: Simon Mo <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
* Initial implementation of chat/completions endpoint and its streaming variant

* Reusing datatypes from the openai entrypoints

* Response role from arg

* Added models endpoint and model validation from the request
* First version

* Revert error.

While there, add missing finalize.

* Use the correct defaults for ROCm.

Increase sampling area to capture crossover.

* Scope end_sync as well.

* Guard only volatile keyword for ifndef USE_ROCM

* Document crossover
* tightened atol for custom PA; enable supported head size, block sizes in testing

* update num_blocks and num_iters in benchmark PA to realistic settings

* move to generic b16 type

* bf16 first port

* enabled all bf16 tests, set atol for bf16

* enable custom PA for bf16 as well as block size 32 and head size 64

* fix cast to zero in custom PA reduce

* py linter fixes

* clang format fixes

* div round up clang-format

---------

Co-authored-by: Charlie Fu <[email protected]>
Co-authored-by: Gregory Shtrasberg <[email protected]>
* remove scoping
* while there fix a typo
* while there remove unused variable
* Per @iotamudelta suggestion until the deadlocks issue is better understood
Revert "Make CAR ROCm 6.1 compatible. (#137)"

This reverts commit 4d2dda6.

* Per @iotamudelta suggestion until the deadlocks issue is better understood
Revert "Optimize custom all reduce (#130)"

This reverts commit 636ff01.
@mht-sharma mht-sharma merged commit 162bb65 into mht-sharma:tgi-vllm-rocm Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants