-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[V1][Metrics] Add request_success_total counter, labelled with finish reason #12579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,6 +13,23 @@ | |
| from vllm.sampling_params import SamplingParams | ||
|
|
||
|
|
||
| class RequestFinishedReason(enum.IntEnum): | ||
| """ | ||
| Reason a request finished - stop, length, or abort. | ||
|
|
||
| stop - a stop string was emitted | ||
| length - max_tokens was consumed, or max_model_len was reached | ||
| abort - aborted for another reason | ||
|
|
||
| """ | ||
| STOP = 0 | ||
| LENGTH = 1 | ||
| ABORT = 2 | ||
|
|
||
| def __str__(self): | ||
| return self.name.lower() | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will create a new string every time it's accessed (and convert to lower). We could instead have a global lookup like: FINISH_REASON_STRINGS = ("stop", "length", "abort")
class RequestFinishedReason(enum.IntEnum):
# ...
def __str__(self):
return FINISH_REASON_STRINGS[self.value]I think we should also mention here that the specific string names form part of the API and so shouldn't be changed (i.e. not just arbitrary identifiers).
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, great catch on the optimization 👍 |
||
|
|
||
|
|
||
| @dataclass | ||
| class EngineCoreRequest: | ||
|
|
||
|
|
@@ -43,7 +60,7 @@ class EngineCoreOutput( | |
| request_id: str | ||
| new_token_ids: List[int] | ||
| finished: bool | ||
| finish_reason: Optional[str] = None | ||
| finish_reason: Optional[RequestFinishedReason] = None | ||
| stop_reason: Union[int, str, None] = None | ||
|
|
||
|
|
||
|
|
@@ -54,7 +71,7 @@ class EngineCoreOutputs( | |
| gc=False): # type: ignore[call-arg] | ||
|
|
||
| #NOTE(Nick): We could consider ways to make this more compact, | ||
| # e.g. columnwise layout and using an int enum for finish/stop reason | ||
| # e.g. columnwise layout | ||
|
|
||
| # [num_reqs] | ||
| outputs: List[EngineCoreOutput] | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about a shorter name
FinishReasonsince it's used in quite a few places?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, sounds good!