[Feature] logger refactor#466
Conversation
| ) -> None: | ||
| _safe_append_jsonl( | ||
| stats_file, | ||
| logger.info( |
There was a problem hiding this comment.
Should we rely on printing this out, or should we consider exposing metrics through a Prometheus interface instead?
There was a problem hiding this comment.
I think this relative to metrics refactor: we collete metrics in client side, whether to use logger or Prometheus depends on the user specifying the appropriate statlogger, just like vllm: LoggingStatLogger, PrometheusStatLogger. This metric refactor can be done in a new RFC
d8e25fb to
ac2ca18
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
ac2ca18 to
f4fe670
Compare
hsliuustc0106
left a comment
There was a problem hiding this comment.
can we change lines for multiple stage metrics? currently it's too long in one line
|
be700a1 to
5520fe2
Compare
|
the summary is also too long, can we change to multiple lines? INFO 12-25 07:04:33 [omni_llm.py:468] [Summary] {'e2e_requests': 1, 'e2e_total_time_ms': 22667.661428451538, 'e2e_sum_time_ms': 22666.945219039917, 'e2e_total_tokens': 0, 'e2e_avg_time_per_request_ms': 22666.945219039917, 'e2e_avg_tokens_per_s': 0.0, 'wall_time_ms': 22667.661428451538, 'final_stage_id': {'0_27b041e4-b66e-4f7e-aa95-7816d8703766': 2}, 'stages': [{'stage_id': 0, 'requests': 1, 'tokens': 62, 'total_time_ms': 1398.9851474761963, 'avg_time_per_request_ms': 1398.9851474761963, 'avg_tokens_per_s': 44.317840051303996}, {'stage_id': 1, 'requests': 1, 'tokens': 896, 'total_time_ms': 16869.991779327393, 'avg_time_per_request_ms': 16869.991779327393, 'avg_tokens_per_s': 53.11205907628033}, {'stage_id': 2, 'requests': 1, 'tokens': 0, 'total_time_ms': 4347.3546504974365, 'avg_time_per_request_ms': 4347.3546504974365, 'avg_tokens_per_s': 0.0}], 'transfers': [{'from_stage': 0, 'to_stage': 1, 'samples': 1, 'total_bytes': 1779125, 'total_time_ms': 2.4514198303222656, 'tx_mbps': 5806.02303365104, 'rx_samples': 1, 'rx_total_bytes': 1779125, 'rx_total_time_ms': 1.5587806701660156, 'rx_mbps': 9130.854822881614, 'total_samples': 1, 'total_transfer_time_ms': 4.788637161254883, 'total_mbps': 2972.244402887727}, {'from_stage': 1, 'to_stage': 2, 'samples': 1, 'total_bytes': 3237, 'total_time_ms': 0.31065940856933594, 'tx_mbps': 83.35817067075979, 'rx_samples': 1, 'rx_total_bytes': 3237, 'rx_total_time_ms': 1.1925697326660156, 'rx_mbps': 21.714453495401838, 'total_samples': 1, 'total_transfer_time_ms': 2.1719932556152344, 'total_mbps': 11.922688955433589}]} |
a28b958 to
1b05ca1
Compare
a0771ed to
f9484cb
Compare
Changed to use pformat for metric logging to improve readability. |
|
please check whether metrics is aligned with vllm upstream |
This PR only refactors the logging mechanism; the contents of the metrics remain unchanged. |
f0c02b1 to
ef1b8c4
Compare
ef1b8c4 to
6e0363d
Compare
Signed-off-by: dengyunyang <[email protected]>
6e0363d to
f81079f
Compare
|
@Bounty-hunter Could we make the final log not a ERROR when the offline script works normally? My concern is the red ERROR word will make users confused and misunderstood it a real error. INFO 12-26 02:56:30 [omni_llm.py:469] 'rx_total_time_ms': 1.2178421020507812,
INFO 12-26 02:56:30 [omni_llm.py:469] 'rx_mbps': 20.83685558026625,
INFO 12-26 02:56:30 [omni_llm.py:469] 'total_samples': 1,
INFO 12-26 02:56:30 [omni_llm.py:469] 'total_transfer_time_ms': 2.4716854095458984,
INFO 12-26 02:56:30 [omni_llm.py:469] 'total_mbps': 10.266678721327288}]}
[Stage-0] ERROR 12-26 02:56:30 [omni_stage.py:559] Received shutdown signal
[Stage-2] ERROR 12-26 02:56:30 [omni_stage.py:559] Received shutdown signal
[Stage-1] ERROR 12-26 02:56:30 [omni_stage.py:559] Received shutdown signal
Request ID: 0_8716d63d-13b5-49a0-9772-b252c2cad419, Text saved to output_audio/0_8716d63d-13b5-49a0-9772-b252c2cad419.txt
Request ID: 0_8716d63d-13b5-49a0-9772-b252c2cad419, Saved audio to output_audio/output_0_8716d63d-13b5-49a0-9772-b252c2cad419.wav |
Using INFO makes more sense; I'll update it in a later PR. |
Signed-off-by: dengyunyang <[email protected]> Co-authored-by: Hongsheng Liu <[email protected]> Signed-off-by: wangyu31577 <[email protected]>
Signed-off-by: dengyunyang <[email protected]> Co-authored-by: Hongsheng Liu <[email protected]>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
As describe in #442, This PR do the following:
(1) Unified log system with vllm logger
(2) Add prefix for each stage
(3) mv some log metrcs from stage to omni client
We just use
logger.warning/info/debug/in the future.Future work:
we need to refactor the metric system
Test Plan
Test Result
Online
(1) check log with stage prefix
(2) metric log in client
Offline
Diffusion
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)