-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[KVConnector] Enable get_block_ids_with_load_errors() in LMCache connector #27978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables error handling for failed KV cache loads from LMCache by implementing the get_block_ids_with_load_errors method in the LMCacheConnectorV1. The implementation correctly provides a fallback for older versions of LMCache that do not support this feature. My review includes a suggestion to improve the robustness of the implementation by checking if the method is callable, which prevents potential TypeError exceptions if the attribute exists but is not a method. This is particularly important for ensuring stability when interacting with versioned dependencies.
vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py
Outdated
Show resolved
Hide resolved
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: ziruiliu <[email protected]>
ApostaC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Zirui Liu <[email protected]>
Head branch was pushed to by a user without write access
Signed-off-by: Zirui Liu <[email protected]>
Signed-off-by: Zirui Liu <[email protected]>
|
Hi @NickLucche, coul you please take a look on this change? This is quite straightfoward to enbale LMCache to report block ids those are failed in retrieval |
|
Thanks for contributing @ziruiliu ! |
…ector (vllm-project#27978) Signed-off-by: Zirui Liu <[email protected]> Signed-off-by: ziruiliu <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <[email protected]> Signed-off-by: George D. Torres <[email protected]>
…ector (vllm-project#27978) Signed-off-by: Zirui Liu <[email protected]> Signed-off-by: ziruiliu <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <[email protected]> Signed-off-by: Bram Wasti <[email protected]>
…ector (vllm-project#27978) Signed-off-by: Zirui Liu <[email protected]> Signed-off-by: ziruiliu <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <[email protected]>
Purpose
In LMCache engine, the PR LMCache/LMCache#1835 implement interface get_block_ids_with_load_errors which was introduced in #19330. So vLLM is able to handle failed cache retrieval from LMCache
Test Plan
See PR's comment at LMCache/LMCache#1835 (comment)
In the test, we try to inject fault in cache retrieval when cache lookup returns OK. vLLM is expected to receive the failed block ids from LMCache and reschedule the request
Test Result
Reschedule happened as expected
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.