-
Notifications
You must be signed in to change notification settings - Fork 121
Add metrics logging and an Activity page to show requests #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… metrics from all processes
… to handle log parsing
WalkthroughThis change introduces a metrics monitoring feature across the backend and frontend. It adds a metrics monitor to the proxy, exposes metrics via new API endpoints, implements middleware for collecting token usage metrics, and extends the UI with an Activity page to display these metrics. Configuration and tests are updated accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant UI (Activity Page)
participant APIProvider
participant Backend (ProxyManager)
participant MetricsMonitor
User->>UI (Activity Page): Navigates to /activity
UI (Activity Page)->>APIProvider: Subscribes to metrics events
APIProvider->>Backend (ProxyManager): Opens event stream (/api/metrics or events API)
Backend (ProxyManager)->>MetricsMonitor: Provides current metrics snapshot & subscribes to new metrics
MetricsMonitor-->>Backend (ProxyManager): Sends metrics events (real-time and initial batch)
Backend (ProxyManager)-->>APIProvider: Streams metrics events
APIProvider-->>UI (Activity Page): Updates metrics state
UI (Activity Page)-->>User: Renders metrics table
sequenceDiagram
participant Client
participant ProxyManager
participant MetricsMiddleware
participant MetricsMonitor
Client->>ProxyManager: Sends OAI request
ProxyManager->>MetricsMiddleware: Proxies request, applies middleware
MetricsMiddleware->>ProxyManager: Forwards request upstream
ProxyManager->>MetricsMiddleware: Receives response
MetricsMiddleware->>MetricsMonitor: Extracts and records token usage metrics
MetricsMonitor-->>ProxyManager: Stores and streams metrics
Estimated code review effort3 (~45 minutes) Suggested labels
Suggested reviewers
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (7)
proxy/config.go (1)
199-200: Consider adding validation for MetricsMaxInMemory.While the default value of 1000 is reasonable, consider adding validation to ensure MetricsMaxInMemory is positive to prevent potential issues with memory management.
+ if config.MetricsMaxInMemory < 1 { + config.MetricsMaxInMemory = 1000 + }proxy/config_posix_test.go (1)
199-199: Consider adding MetricsLogPath to the test for completeness.While the MetricsMaxInMemory addition is correct, consider also including
MetricsLogPath: "",in the expected Config struct for consistency with how other default string fields are tested.HealthCheckTimeout: 15, + MetricsLogPath: "", MetricsMaxInMemory: 1000,config.example.yaml (1)
18-29: Enhance metrics configuration documentation with practical examples and considerations.The configuration is well-documented, but consider these improvements:
- Use a more realistic example path
- Add constraints and recommendations
- Mention log rotation considerations
# metricsMaxInMemory: maximum number of metrics to keep in memory # - optional, default: 1000 # - controls how many metrics are stored in memory before older ones are discarded # - useful for limiting memory usage when processing large volumes of metrics +# - minimum: 100, maximum: 10000 metricsMaxInMemory: 1000 # metricsLogPath: sets the path to the metrics log file # - optional, default: empty string # - if set, metrics will be logged to the specified file # - useful for persistent logging of metrics -metricsLogPath: /path/to/metrics.log +# - consider implementing log rotation to prevent unbounded growth +# metricsLogPath: /var/log/llama-swap/metrics.logproxy/process.go (1)
526-549: Consider handling scanner buffer limitations for very long log lines.The default
bufio.Scannerhas a buffer limit of 64KB. If llama-server produces very long log lines (e.g., with extensive debug output), they might be truncated or cause scanning errors.func (p *Process) processOutput(reader *io.PipeReader, streamType string) { defer reader.Close() scanner := bufio.NewScanner(reader) + // Increase buffer size to handle potentially long log lines + const maxScanTokenSize = 1024 * 1024 // 1MB + buf := make([]byte, maxScanTokenSize) + scanner.Buffer(buf, maxScanTokenSize) + for scanner.Scan() {ui/src/pages/Activity.tsx (3)
3-11: Consider making token fields optional in the interface.Since the rendering code checks for the existence of
input_tokensandoutput_tokenswith fallback values, these fields appear to be optional in the actual data.interface Metric { timestamp: string; model: string; - input_tokens: number; - output_tokens: number; + input_tokens?: number; + output_tokens?: number; duration_ms: number; tokens_per_second: number; status_code: number; }
46-48: Add space before unit in duration formatting.For consistency with the
formatSpeedfunction, add a space before the unit.const formatDuration = (ms: number) => { - return (ms / 1000).toFixed(2) + 's'; + return (ms / 1000).toFixed(2) + ' s'; };
110-131: Consider using a more stable key for table rows.Using array index as a key can cause issues with React's reconciliation if the metrics order changes. Consider using a combination of timestamp and model as a more stable key.
-{metrics.map((metric, index) => ( - <tr key={index}> +{metrics.map((metric) => ( + <tr key={`${metric.timestamp}-${metric.model}`}>
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
config.example.yaml(2 hunks)docker/config.example.yaml(1 hunks)proxy/config.go(2 hunks)proxy/config_posix_test.go(1 hunks)proxy/metrics_parser.go(1 hunks)proxy/process.go(6 hunks)proxy/process_test.go(14 hunks)proxy/processgroup.go(2 hunks)proxy/processgroup_test.go(2 hunks)proxy/proxymanager.go(8 hunks)proxy/proxymanager_api.go(2 hunks)ui/src/App.tsx(3 hunks)ui/src/pages/Activity.tsx(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
proxy/process_test.go (1)
Learnt from: mostlygeek
PR: mostlygeek/llama-swap#155
File: proxy/process.go:359-372
Timestamp: 2025-06-05T22:57:53.793Z
Learning: In Go, cmdWaitChan in proxy/process.go is a signal channel (chan struct{}) that follows the pattern where waitForCmd() closes the channel when cmd.Wait() returns, and stopCommand() waits on it. Reading from a closed channel returns immediately, so <-cmdWaitChan won't block indefinitely.
proxy/process.go (1)
Learnt from: mostlygeek
PR: mostlygeek/llama-swap#155
File: proxy/process.go:359-372
Timestamp: 2025-06-05T22:57:53.793Z
Learning: In Go, cmdWaitChan in proxy/process.go is a signal channel (chan struct{}) that follows the pattern where waitForCmd() closes the channel when cmd.Wait() returns, and stopCommand() waits on it. Reading from a closed channel returns immediately, so <-cmdWaitChan won't block indefinitely.
🧬 Code Graph Analysis (4)
proxy/processgroup_test.go (1)
proxy/processgroup.go (1)
NewProcessGroup(27-52)
proxy/processgroup.go (2)
proxy/metrics_parser.go (1)
MetricsParser(24-32)proxy/process.go (1)
NewProcess(78-104)
proxy/process_test.go (1)
proxy/process.go (1)
NewProcess(78-104)
proxy/proxymanager_api.go (1)
proxy/proxymanager.go (1)
ProxyManager(27-45)
🔇 Additional comments (9)
proxy/config.go (1)
145-146: LGTM! Configuration fields are well-structured.The new metrics configuration fields follow the existing patterns and have appropriate default values.
docker/config.example.yaml (1)
3-3: LGTM! Configuration example is consistent.The addition of
metricsMaxInMemory: 1000is consistent with the default value in the Go configuration and appropriately placed among other global settings.proxy/processgroup_test.go (1)
36-36: LGTM! Test updates correctly handle the new constructor signature.The addition of
nilfor the metricsParser parameter in all NewProcessGroup calls is appropriate and maintains test compatibility with the updated constructor.Also applies to: 41-41, 48-48, 76-76
ui/src/App.tsx (3)
6-6: LGTM! Import follows existing patterns.The ActivityPage import is correctly placed and follows the established pattern for other page imports.
26-28: LGTM! Navigation link is properly structured.The Activity NavLink follows the same pattern as existing navigation items with proper active state handling.
40-40: LGTM! Route is correctly configured.The Activity route is properly placed and follows the established routing pattern for the application.
proxy/process_test.go (1)
35-35: LGTM! Consistent test updates for the new constructor parameter.All
NewProcesscalls correctly passnilfor themetricsParserparameter, which is appropriate for these test cases that don't exercise metrics functionality.Also applies to: 71-71, 99-99, 124-124, 166-166, 193-193, 266-266, 299-299, 334-334, 352-352, 377-377, 413-413, 461-461, 481-482
proxy/processgroup.go (1)
27-27: LGTM! Correctly propagates metricsParser to Process instances.The changes properly thread the
metricsParserparameter through to eachProcesscreated within the group, enabling shared metrics collection across all processes.Also applies to: 47-47
proxy/process.go (1)
541-543: MetricsParser is already safe for concurrent useThe
MetricsParserstruct includes ansync.RWMutexand itsaddMetricsmethod (called byParseLogLine) acquires a write lock around all state mutations (including appending to the slice and file writes). No further synchronization is required.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
proxy/proxymanager_test.go (1)
720-751: Good streaming metrics test, but consider timing robustness.The test correctly verifies streaming request metrics collection, addressing the maintainer's concern about streaming support. However, consider adding a small delay or retry mechanism when retrieving metrics to handle potential timing issues in concurrent test execution.
Consider this improvement for more robust timing:
// Check that metrics were recorded +time.Sleep(10 * time.Millisecond) // Brief delay to ensure metrics processing metrics := proxy.metricsMonitor.GetMetrics() assert.NotEmpty(t, metrics, "metrics should be recorded for streaming request")
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
misc/simple-responder/simple-responder.go(2 hunks)proxy/metrics_middleware.go(1 hunks)proxy/metrics_monitor.go(1 hunks)proxy/proxymanager.go(4 hunks)proxy/proxymanager_api.go(5 hunks)proxy/proxymanager_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- proxy/metrics_monitor.go
- proxy/proxymanager_api.go
- proxy/proxymanager.go
- proxy/metrics_middleware.go
🔇 Additional comments (3)
misc/simple-responder/simple-responder.go (2)
51-55: LGTM! Fixed usage metrics for testing.The hardcoded usage field provides consistent dummy token metrics for testing the new metrics collection functionality. The values align with OpenAI API response format.
82-86: Consistent usage metrics added to completions endpoint.Good consistency with the chat completions endpoint. The fixed token counts will help validate metrics parsing in tests.
proxy/proxymanager_test.go (1)
687-718: Excellent test coverage for non-streaming metrics collection.The test properly validates that metrics middleware captures token usage for non-streaming requests. Good integration with the simple-responder's fixed usage values (25 input, 10 output tokens).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
proxy/metrics_middleware.go (1)
14-44: Critical issue: Missing startTime initialization causes incorrect metrics.The
MetricsRecorder.startTimefield is not initialized in the middleware setup, which will result in incorrect duration calculations and tokens-per-second metrics. This was flagged in previous reviews and still needs to be addressed.Apply this fix to properly initialize the startTime:
writer := &MetricsResponseWriter{ ResponseWriter: c.Writer, metricsRecorder: &MetricsRecorder{ metricsMonitor: pm.metricsMonitor, modelName: requestedModel, // will be updated in proxyOAIHandler isStreaming: gjson.GetBytes(bodyBytes, "stream").Bool(), + startTime: time.Now(), }, }
🧹 Nitpick comments (1)
proxy/metrics_middleware.go (1)
46-51: Address past feedback: Consider removing unused modelName update comment.The comment "// will be updated in proxyOAIHandler" on line 34 suggests the modelName might be updated later, but based on the current implementation, the requestedModel from the request body appears to be the correct value to use. If no further updates are needed, consider removing this potentially confusing comment.
- modelName: requestedModel, // will be updated in proxyOAIHandler + modelName: requestedModel,
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
misc/simple-responder/simple-responder.go(2 hunks)proxy/metrics_middleware.go(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: run-tests
🔇 Additional comments (5)
misc/simple-responder/simple-responder.go (2)
38-106: LGTM: Well-implemented streaming support for testing metrics collection.The implementation correctly handles both streaming and non-streaming responses with proper SSE formatting for streaming mode. The fixed usage statistics (completion_tokens: 10, prompt_tokens: 25, total_tokens: 35) are consistent across all endpoints and provide reliable test data for the metrics middleware validation.
The streaming logic properly:
- Sets correct SSE headers
- Sends incremental token messages as SSE events
- Includes usage statistics in the final message
- Terminates with [DONE] marker
131-136: Consistent usage statistics addition enhances test coverage.The addition of usage statistics to the
/v1/completionsendpoint maintains consistency with the other endpoints and ensures comprehensive testing of metrics collection across different API endpoints.proxy/metrics_middleware.go (3)
62-84: Excellent refactoring: Addresses code duplication feedback.The extraction of metrics parsing logic into a shared
parseAndRecordMetricsmethod successfully eliminates the code duplication identified in previous reviews. This improves maintainability and ensures consistent metrics processing across streaming and non-streaming responses.
86-110: Robust streaming response parsing with proper SSE handling.The streaming response processing correctly:
- Splits response by lines and handles SSE "data:" prefix
- Validates JSON before parsing to prevent panics
- Properly terminates on "[DONE]" marker
- Handles empty lines and malformed data gracefully
130-137: Safe Write method implementation with proper error handling.The Write method correctly captures response data while preserving the original response flow. Error handling ensures that write failures are properly propagated while maintaining the captured data integrity.
mostlygeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking pretty good. Some tiny things left and I think we can merge this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
misc/simple-responder/simple-responder.go(2 hunks)proxy/metrics_middleware.go(1 hunks)proxy/proxymanager.go(4 hunks)proxy/proxymanager_test.go(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- misc/simple-responder/simple-responder.go
- proxy/proxymanager_test.go
- proxy/proxymanager.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: run-tests
🔇 Additional comments (2)
proxy/metrics_middleware.go (2)
69-91: Good refactoring to eliminate code duplication.The extraction of
parseAndRecordMetricsas a shared method successfully addresses the previous code duplication concerns. The logic correctly calculates token metrics and duration.
130-152: Clean and efficient response writer implementation.The
MetricsResponseWriterproperly captures response bytes while maintaining compatibility with the Gin ResponseWriter interface. The approach of buffering everything and processing after completion aligns well with the maintainer's feedback.
|
Excellent work @g2mt. This looks really good now. Thanks for the big contribution! |
- use new metrics data instead of log parsing - auto-start events connection to server, improves responsiveness - remove unnecessary libraries and code
|
Released in v140 |
This PR adds an Activity page inspired by OpenRouter to show requests sent to the server. It also adds the config settings
metricsMaxInMemory,metricsLogPath,metricsUseServerResponse.The
metricsUseServerResponseconfig controls logging behavior. By default, metricsUseServerResponse is false, meaning the token processing metrics are retrieved from the llama-server eval logs, similar to the existing Log Stats section. If set to true then the final response JSON from the llama-server will be parsed to obtain token usage statistics. Retrieving usage from streaming responses is not implemented.I used Roo Code, aider with mostly Kimi-K2 to assist in this PR.
Summary by CodeRabbit
New Features
Improvements