-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[Core][Feature] Input metadata dump on crash #13407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Wallas Santos <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
|
@wallashss Thanks for writing up this PR. I think it will be useful to have details for debugging printed to the logs at crash! When I try out these changes in my dev environment running online mode with and sending a request with a large prompt and requesting I see the The above seems to happen only when the first reques to the server crashes it. If I send a shortened request first (e.g. prompt from |
|
Tks for try it out @tjohnson31415, I'll take a look in you examples and address these issues. |
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
|
Hey @tjohnson31415 I think I addressed your issues:
I repro your setup and I saw the logs, the dump is before the stacktrace. The server was indeed hanging, the issue was a pickling of my custom error, I fixed that.
Fixed that too, thanks for reporting this. |
Signed-off-by: Wallas Santos <[email protected]>
…rash Signed-off-by: Wallas Santos <[email protected]>
|
Hey @comaniac, do you think this feature implemented this way makes sense for vLLM? I'd love to hear your feedback. Thanks! |
comaniac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea looks good. Let's polish the code and I'm ok with a green light. The most important point I'd like to highlight again is I feel we only need to support this feature in v1 for simplicity.
vllm/error_report.py
Outdated
| return False | ||
|
|
||
|
|
||
| def prepare_object_to_dump(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is basically the root cause of removing the previous input dump. It has many cases to handle and will be affected if any of them is changed. Specifically, primitive types and torch.Tensor are fine, but I'm a bit worry about SequenceData and NewRequestData.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I share the same concern.
The custom handler for theses classes is to obfuscate the prompts. But I can not anticipate that we always have the right implementation for future changes. I guess we could add more hardcoded logs, comments, asserts, and tests to warn other developers of this feature at the cost of increase the maintenance of this feature. I am not sure of this, but I would like to hear more feedback or ideas from your side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems create burden to developers and this is the reason of removing input dump. Ideally we could have an approach to recursively traverse an input object and serialize them with tensor values ignored. Another direction is providing these methods in custom data structures (e.g., SequenceData.dump()) so that they can be in the same place to ease the maintenance.
Also cc @youkaichao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a method called anon_repr for those classes, which is similar to the __repr__ implementation. They are close and I added comment there to help other contributors to be aware of that. The prepare_object_to_dump is has indirect awareness of this method, it check in the serialization if the object contains this method, and use it if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: I changed the format of dump to be more like how __repr__ outputs string representation of objects instead of JSON. I think it got more standardized and consistent with what we already have been using with __repr__.
vllm/error_report.py
Outdated
| execute_model_req: Union[ExecuteModelRequest, | ||
| None] = None): | ||
|
|
||
| assert engine_version == 0 or engine_version == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we don't need to support v0. Reasons:
- The code could be much cleaner.
- v0 is going to be deprecated soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but I guess that are still a lot of deployments running right now that are based on V0 (at least from our side). That's why we are interested in support both engines.
Do you think you can reconsider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're planning to freeze v0 I still don't feel we should support it. However if you really need, I'd suggest that we separate the v0/v1 logic completely in different functions (e.g. xxx_v0), so that in the future when we want to deprecate v0, we can easily locate the logic and remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I splitted the functions to ease the identification of the version.
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
…rash Signed-off-by: Wallas Santos <[email protected]>
added logs for scheduler stats minor fixes Signed-off-by: Wallas Santos <[email protected]>
|
Finally all green! This PR has been there for a while, and now I am convinced to remove the V0 support. I also did some minor updates. If any of you guys could have a second look and merge I'd appreciate it. Thanks! |
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wallashss, and sorry I have a couple more comments!
vllm/v1/engine/core.py
Outdated
| class ModelExecutionError(RuntimeError): | ||
| """Custom RuntimeError with input data for model execution | ||
| In a nutshell, this object is useful for custom handling of exception for | ||
| the case the engine raises an error. For instance, it is used to log the | ||
| input metadata that is useful for debugging on engine crashes. | ||
| Args: | ||
| scheduler_output: SchedulerOutput object that contains the input | ||
| data for model execution | ||
| """ | ||
| scheduler_output: SchedulerOutput | ||
| scheduler_stats: SchedulerStats | ||
|
|
||
| def __init__(self, *args, scheduler_output=None, scheduler_stats=None): | ||
| super().__init__(*args) | ||
| self.scheduler_output = scheduler_output | ||
| self.scheduler_stats = scheduler_stats | ||
|
|
||
| def __reduce__(self): | ||
| # To avoid pickle errors. | ||
| # This happens when we exchange this object between processes | ||
| # since scheduler_output can have objects that only makes sense | ||
| # to their context/process we remove them from the serialization | ||
| # and only send the summary of the error as a regular RuntimeError. | ||
| return (self.__class__, (self.args[0], )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this class to dump_input.py to keep core.py cleaner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After removing v0 and reviewing it, I think we can totally remove this class and get a cleaner code.
Signed-off-by: Wallas Santos <[email protected]>
…rash Signed-off-by: Wallas Santos <[email protected]>
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wallashss, sorry for forgetting to come back to this. It looks good to me I just have one more minor comment.
|
Thanks @njhill ! Could you please point me out the comment? Is that a new one? Or did I missed this? |
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wallashss apologies I must have failed to press the button to include the comment in the review.
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wallashss sorry for the extra nits, just want to keep the core engine loop as clean as possible. I will merge as soon as these are in!
Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Wallas Santos <[email protected]>
104a8b8 to
51596e4
Compare
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wallashss
|
Thanks a lot @njhill ! |
Signed-off-by: Wallas Santos <[email protected]> Signed-off-by: 汪志鹏 <[email protected]>
Signed-off-by: Wallas Santos <[email protected]> Signed-off-by: Mu Huai <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>
This PR adds a feature to dump input metadata when vllm engine crashes. In essence, this change is the spiritual successor to #8305 that was recently removed in #12582. However, I tried to solve it differently, since this feature can give us more hints to help debug crashes in production environment. So, I would like to propose it again to the community and give it a second chance.
Summary:
V0 dump sample
V1 Dump Sample