-
Notifications
You must be signed in to change notification settings - Fork 120
Fix token metrics parsing #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- use llama-server's `timings` info if available in response body - send "-1" for token/sec when not able to accurately calculate performance - optimize streaming body search for metrics information
|
""" WalkthroughThis change updates the simple-responder server to include a "timings" field with fixed values in its JSON responses for both streaming and non-streaming chat completions. The proxy's metrics middleware is updated to parse timing information from either the new "timings" field or the legacy "usage" field, prioritizing the last usage data in streaming responses. The Activity page in the UI now displays "unknown" for negative token speeds and renames the speed column. Tests were improved to avoid panics by early returning if metrics are empty. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant SimpleResponder
participant Proxy
participant UI
Client->>SimpleResponder: POST /v1/chat/completions (streaming or non-streaming)
SimpleResponder-->>Client: JSON response including "timings" field
Client->>Proxy: Receives response stream or full response
Proxy->>Proxy: parseAndRecordMetrics (use "timings" if present, else "usage")
Proxy-->>UI: Updates metrics data
UI->>UI: Displays token speed ("unknown" if negative)
Estimated code review effort2 (~20 minutes) Possibly related PRs
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (2)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
misc/simple-responder/simple-responder.go(2 hunks)proxy/metrics_middleware.go(1 hunks)ui/src/pages/Activity.tsx(1 hunks)
🔇 Additional comments (6)
ui/src/pages/Activity.tsx (2)
4-14: Good refactor: Moving formatting functions to module scope.Moving these utility functions outside the component prevents unnecessary recreations on each render, improving performance. The functions are pure and stateless, making this an appropriate optimization.
8-10: Excellent handling of negative token speeds.The updated logic to return "unknown" for negative speeds aligns perfectly with the PR objective of displaying meaningful information when token per second metrics cannot be accurately calculated.
misc/simple-responder/simple-responder.go (2)
81-88: Good addition of llama.cpp compatible timing metrics.The "timings" object with llama.cpp-style fields provides more accurate timing information for the metrics system. The fixed test values are appropriate for a mock server environment.
113-119: Consistent timing metrics for non-streaming responses.The addition maintains consistency between streaming and non-streaming response formats, ensuring the metrics middleware can parse timing information from both response types.
proxy/metrics_middleware.go (2)
70-102: Excellent dual-format parsing with proper fallback logic.The implementation correctly prioritizes the more accurate "timings" data while falling back to "usage" data when timings aren't available. The default value of -1.0 for
tokensPerSecondaligns perfectly with the PR objective of indicating when metrics cannot be accurately calculated.
105-141: Smart optimization: Reverse iteration for streaming responses.The backward iteration through response lines is an excellent optimization that finds the final/most complete metrics data more efficiently. The explicit SSE parsing with proper "data:" prefix checking and "[DONE]" handling makes the code more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
proxy/metrics_middleware.go (1)
71-74: Remove early return to support timings-only responsesThe current implementation returns early if the "usage" field doesn't exist, preventing metrics recording when only "timings" data is available. This contradicts the PR objective of using timings information when available.
Apply this fix:
- usage := jsonData.Get("usage") - if !usage.Exists() { - return - } + // Process metrics if either timings or usage data exists + if !jsonData.Get("timings").Exists() && !jsonData.Get("usage").Exists() { + return + }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
proxy/metrics_middleware.go(1 hunks)ui/src/pages/Activity.tsx(2 hunks)
🧬 Code Graph Analysis (1)
proxy/metrics_middleware.go (1)
proxy/metrics_monitor.go (1)
TokenMetrics(12-20)
🚧 Files skipped from review as they are similar to previous changes (1)
- ui/src/pages/Activity.tsx
🧰 Additional context used
🧬 Code Graph Analysis (1)
proxy/metrics_middleware.go (1)
proxy/metrics_monitor.go (1)
TokenMetrics(12-20)
Fix #198
timingsinfo if available in response bodySummary by CodeRabbit
New Features
Improvements
Style
Tests