-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[Doc] V1 user guide #13991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc] V1 user guide #13991
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
robertgshaw2-redhat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! can you finish this off tomorrow? I want to include a link to this page in the logs
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
ywang96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments - PTAL
Co-authored-by: Roger Wang <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
|
Thanks for the PR! I will take a look tmr (Tue). |
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
LiuXiaoxuanPKU
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the guide. It's great with lots of information.
Just some concerns:
- On the high level, I feel it's a bit too negative? We talk a lot about features we don't support or need to be optimized. I feel we should highlight some optimized features, such as chunked prefill and prefix caching.
- I feel we need to mention the scheduler (scheduling policy) somewhere, currently we batch prefill tokens and decoding tokens in the same batch, we don't prioritize prefill/decode.
| (e.g., see [PR #13096](https://github.com/vllm-project/vllm/pull/13096)). | ||
|
|
||
| - **Spec Decode**: Currently, only ngram-based spec decode is supported in V1. There | ||
| will be follow-up work to support other types of spec decode (e.g., see [PR #13933](https://github.com/vllm-project/vllm/pull/13933)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| will be follow-up work to support other types of spec decode (e.g., see [PR #13933](https://github.com/vllm-project/vllm/pull/13933)). | |
| will be follow-up work to support other types of spec decode (e.g., see [PR #13933](https://github.com/vllm-project/vllm/pull/13933)). We will prioritize the support for Eagle, MTP compared to draft model based spec decode. |
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
simon-mo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this!!
|
Oh we need a small section on hardware support saying NVIDIA is natively supported, AMD and TPU are work in progress. |
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]>
simon-mo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great first verison!
Signed-off-by: Jennifer Zhao <[email protected]> Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Jennifer Zhao <[email protected]> Co-authored-by: Jennifer Zhao <[email protected]> Co-authored-by: Jennifer Zhao <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]> Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Jennifer Zhao <[email protected]> Co-authored-by: Jennifer Zhao <[email protected]> Co-authored-by: Jennifer Zhao <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Jennifer Zhao <[email protected]> Signed-off-by: Roger Wang <[email protected]> Signed-off-by: Jennifer Zhao <[email protected]> Co-authored-by: Jennifer Zhao <[email protected]> Co-authored-by: Jennifer Zhao <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Mu Huai <[email protected]>
This PR adds the v1 User Guide as a living document, to be updated and expanded over time.