Skip to content

Conversation

@g2mt
Copy link
Contributor

@g2mt g2mt commented Jul 18, 2025

This PR adds an Activity page inspired by OpenRouter to show requests sent to the server. It also adds the config settings metricsMaxInMemory, metricsLogPath, metricsUseServerResponse.

llama-swap

The metricsUseServerResponse config controls logging behavior. By default, metricsUseServerResponse is false, meaning the token processing metrics are retrieved from the llama-server eval logs, similar to the existing Log Stats section. If set to true then the final response JSON from the llama-server will be parsed to obtain token usage statistics. Retrieving usage from streaming responses is not implemented.

I used Roo Code, aider with mostly Kimi-K2 to assist in this PR.

Summary by CodeRabbit

  • New Features

    • Introduced real-time tracking and display of token usage metrics, including input/output tokens, processing speed, and duration.
    • Added a new "Activity" page in the user interface to view recent metrics.
    • Provided an API endpoint and event stream for accessing metrics data.
  • Improvements

    • Added new configuration option to control the maximum number of stored metrics in memory.
    • Enhanced API and UI to display and manage token usage statistics.

@coderabbitai
Copy link

coderabbitai bot commented Jul 18, 2025

Walkthrough

This change introduces a metrics monitoring feature across the backend and frontend. It adds a metrics monitor to the proxy, exposes metrics via new API endpoints, implements middleware for collecting token usage metrics, and extends the UI with an Activity page to display these metrics. Configuration and tests are updated accordingly.

Changes

Files/Group Change Summary
config.example.yaml, docker/config.example.yaml Added metricsMaxInMemory configuration key (default 1000) with comments.
proxy/config.go Added MetricsMaxInMemory field to Config struct and set default in loader.
proxy/config_posix_test.go, proxy/config_windows_test.go Updated tests to include MetricsMaxInMemory in expected config objects.
proxy/proxymanager.go Added metricsMonitor to ProxyManager, initialized in constructor, integrated metrics middleware, and updated handlers to support metrics collection.
proxy/proxymanager_api.go Added /api/metrics endpoint, integrated metrics streaming into events, and added handler for metrics snapshot.
proxy/events.go Added TokenMetricsEventID constant for event identification.
proxy/metrics_monitor.go (new) Implemented MetricsMonitor for collecting, storing, and streaming token usage metrics.
proxy/metrics_middleware.go (new) Added Gin middleware and custom response writer for extracting and recording token usage metrics from responses.
proxy/proxymanager_test.go Added tests verifying metrics collection for streaming and non-streaming chat completions; updated response type assertions.
ui/src/App.tsx Added Activity page route and navigation link.
ui/src/pages/Activity.tsx (new) Added ActivityPage component to display metrics in a table.
ui/src/contexts/APIProvider.tsx Extended API context and state with metrics array, added event handling for metrics.
misc/simple-responder/simple-responder.go Added fixed "usage" field with token counts to JSON responses of chat completion endpoints.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant UI (Activity Page)
    participant APIProvider
    participant Backend (ProxyManager)
    participant MetricsMonitor

    User->>UI (Activity Page): Navigates to /activity
    UI (Activity Page)->>APIProvider: Subscribes to metrics events
    APIProvider->>Backend (ProxyManager): Opens event stream (/api/metrics or events API)
    Backend (ProxyManager)->>MetricsMonitor: Provides current metrics snapshot & subscribes to new metrics
    MetricsMonitor-->>Backend (ProxyManager): Sends metrics events (real-time and initial batch)
    Backend (ProxyManager)-->>APIProvider: Streams metrics events
    APIProvider-->>UI (Activity Page): Updates metrics state
    UI (Activity Page)-->>User: Renders metrics table
Loading
sequenceDiagram
    participant Client
    participant ProxyManager
    participant MetricsMiddleware
    participant MetricsMonitor

    Client->>ProxyManager: Sends OAI request
    ProxyManager->>MetricsMiddleware: Proxies request, applies middleware
    MetricsMiddleware->>ProxyManager: Forwards request upstream
    ProxyManager->>MetricsMiddleware: Receives response
    MetricsMiddleware->>MetricsMonitor: Extracts and records token usage metrics
    MetricsMonitor-->>ProxyManager: Stores and streams metrics
Loading

Estimated code review effort

3 (~45 minutes)

Suggested labels

enhancement, UI

Suggested reviewers

  • mostlygeek

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05509e6 and 242da36.

📒 Files selected for processing (1)
  • proxy/metrics_middleware.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • proxy/metrics_middleware.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: run-tests
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (7)
proxy/config.go (1)

199-200: Consider adding validation for MetricsMaxInMemory.

While the default value of 1000 is reasonable, consider adding validation to ensure MetricsMaxInMemory is positive to prevent potential issues with memory management.

+	if config.MetricsMaxInMemory < 1 {
+		config.MetricsMaxInMemory = 1000
+	}
proxy/config_posix_test.go (1)

199-199: Consider adding MetricsLogPath to the test for completeness.

While the MetricsMaxInMemory addition is correct, consider also including MetricsLogPath: "", in the expected Config struct for consistency with how other default string fields are tested.

 		HealthCheckTimeout: 15,
+		MetricsLogPath:     "",
 		MetricsMaxInMemory: 1000,
config.example.yaml (1)

18-29: Enhance metrics configuration documentation with practical examples and considerations.

The configuration is well-documented, but consider these improvements:

  1. Use a more realistic example path
  2. Add constraints and recommendations
  3. Mention log rotation considerations
 # metricsMaxInMemory: maximum number of metrics to keep in memory
 # - optional, default: 1000
 # - controls how many metrics are stored in memory before older ones are discarded
 # - useful for limiting memory usage when processing large volumes of metrics
+# - minimum: 100, maximum: 10000
 metricsMaxInMemory: 1000

 # metricsLogPath: sets the path to the metrics log file
 # - optional, default: empty string
 # - if set, metrics will be logged to the specified file
 # - useful for persistent logging of metrics
-metricsLogPath: /path/to/metrics.log
+# - consider implementing log rotation to prevent unbounded growth
+# metricsLogPath: /var/log/llama-swap/metrics.log
proxy/process.go (1)

526-549: Consider handling scanner buffer limitations for very long log lines.

The default bufio.Scanner has a buffer limit of 64KB. If llama-server produces very long log lines (e.g., with extensive debug output), they might be truncated or cause scanning errors.

 func (p *Process) processOutput(reader *io.PipeReader, streamType string) {
 	defer reader.Close()
 
 	scanner := bufio.NewScanner(reader)
+	// Increase buffer size to handle potentially long log lines
+	const maxScanTokenSize = 1024 * 1024 // 1MB
+	buf := make([]byte, maxScanTokenSize)
+	scanner.Buffer(buf, maxScanTokenSize)
+	
 	for scanner.Scan() {
ui/src/pages/Activity.tsx (3)

3-11: Consider making token fields optional in the interface.

Since the rendering code checks for the existence of input_tokens and output_tokens with fallback values, these fields appear to be optional in the actual data.

 interface Metric {
   timestamp: string;
   model: string;
-  input_tokens: number;
-  output_tokens: number;
+  input_tokens?: number;
+  output_tokens?: number;
   duration_ms: number;
   tokens_per_second: number;
   status_code: number;
 }

46-48: Add space before unit in duration formatting.

For consistency with the formatSpeed function, add a space before the unit.

 const formatDuration = (ms: number) => {
-  return (ms / 1000).toFixed(2) + 's';
+  return (ms / 1000).toFixed(2) + ' s';
 };

110-131: Consider using a more stable key for table rows.

Using array index as a key can cause issues with React's reconciliation if the metrics order changes. Consider using a combination of timestamp and model as a more stable key.

-{metrics.map((metric, index) => (
-  <tr key={index}>
+{metrics.map((metric) => (
+  <tr key={`${metric.timestamp}-${metric.model}`}>
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6299c1b and 2fee6b8.

📒 Files selected for processing (13)
  • config.example.yaml (2 hunks)
  • docker/config.example.yaml (1 hunks)
  • proxy/config.go (2 hunks)
  • proxy/config_posix_test.go (1 hunks)
  • proxy/metrics_parser.go (1 hunks)
  • proxy/process.go (6 hunks)
  • proxy/process_test.go (14 hunks)
  • proxy/processgroup.go (2 hunks)
  • proxy/processgroup_test.go (2 hunks)
  • proxy/proxymanager.go (8 hunks)
  • proxy/proxymanager_api.go (2 hunks)
  • ui/src/App.tsx (3 hunks)
  • ui/src/pages/Activity.tsx (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
proxy/process_test.go (1)
Learnt from: mostlygeek
PR: mostlygeek/llama-swap#155
File: proxy/process.go:359-372
Timestamp: 2025-06-05T22:57:53.793Z
Learning: In Go, cmdWaitChan in proxy/process.go is a signal channel (chan struct{}) that follows the pattern where waitForCmd() closes the channel when cmd.Wait() returns, and stopCommand() waits on it. Reading from a closed channel returns immediately, so <-cmdWaitChan won't block indefinitely.
proxy/process.go (1)
Learnt from: mostlygeek
PR: mostlygeek/llama-swap#155
File: proxy/process.go:359-372
Timestamp: 2025-06-05T22:57:53.793Z
Learning: In Go, cmdWaitChan in proxy/process.go is a signal channel (chan struct{}) that follows the pattern where waitForCmd() closes the channel when cmd.Wait() returns, and stopCommand() waits on it. Reading from a closed channel returns immediately, so <-cmdWaitChan won't block indefinitely.
🧬 Code Graph Analysis (4)
proxy/processgroup_test.go (1)
proxy/processgroup.go (1)
  • NewProcessGroup (27-52)
proxy/processgroup.go (2)
proxy/metrics_parser.go (1)
  • MetricsParser (24-32)
proxy/process.go (1)
  • NewProcess (78-104)
proxy/process_test.go (1)
proxy/process.go (1)
  • NewProcess (78-104)
proxy/proxymanager_api.go (1)
proxy/proxymanager.go (1)
  • ProxyManager (27-45)
🔇 Additional comments (9)
proxy/config.go (1)

145-146: LGTM! Configuration fields are well-structured.

The new metrics configuration fields follow the existing patterns and have appropriate default values.

docker/config.example.yaml (1)

3-3: LGTM! Configuration example is consistent.

The addition of metricsMaxInMemory: 1000 is consistent with the default value in the Go configuration and appropriately placed among other global settings.

proxy/processgroup_test.go (1)

36-36: LGTM! Test updates correctly handle the new constructor signature.

The addition of nil for the metricsParser parameter in all NewProcessGroup calls is appropriate and maintains test compatibility with the updated constructor.

Also applies to: 41-41, 48-48, 76-76

ui/src/App.tsx (3)

6-6: LGTM! Import follows existing patterns.

The ActivityPage import is correctly placed and follows the established pattern for other page imports.


26-28: LGTM! Navigation link is properly structured.

The Activity NavLink follows the same pattern as existing navigation items with proper active state handling.


40-40: LGTM! Route is correctly configured.

The Activity route is properly placed and follows the established routing pattern for the application.

proxy/process_test.go (1)

35-35: LGTM! Consistent test updates for the new constructor parameter.

All NewProcess calls correctly pass nil for the metricsParser parameter, which is appropriate for these test cases that don't exercise metrics functionality.

Also applies to: 71-71, 99-99, 124-124, 166-166, 193-193, 266-266, 299-299, 334-334, 352-352, 377-377, 413-413, 461-461, 481-482

proxy/processgroup.go (1)

27-27: LGTM! Correctly propagates metricsParser to Process instances.

The changes properly thread the metricsParser parameter through to each Process created within the group, enabling shared metrics collection across all processes.

Also applies to: 47-47

proxy/process.go (1)

541-543: MetricsParser is already safe for concurrent use

The MetricsParser struct includes an sync.RWMutex and its addMetrics method (called by ParseLogLine) acquires a write lock around all state mutations (including appending to the slice and file writes). No further synchronization is required.

@g2mt g2mt marked this pull request as draft July 19, 2025 00:52
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
proxy/proxymanager_test.go (1)

720-751: Good streaming metrics test, but consider timing robustness.

The test correctly verifies streaming request metrics collection, addressing the maintainer's concern about streaming support. However, consider adding a small delay or retry mechanism when retrieving metrics to handle potential timing issues in concurrent test execution.

Consider this improvement for more robust timing:

// Check that metrics were recorded
+time.Sleep(10 * time.Millisecond) // Brief delay to ensure metrics processing
metrics := proxy.metricsMonitor.GetMetrics()
assert.NotEmpty(t, metrics, "metrics should be recorded for streaming request")
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0813157 and f4288bb.

📒 Files selected for processing (6)
  • misc/simple-responder/simple-responder.go (2 hunks)
  • proxy/metrics_middleware.go (1 hunks)
  • proxy/metrics_monitor.go (1 hunks)
  • proxy/proxymanager.go (4 hunks)
  • proxy/proxymanager_api.go (5 hunks)
  • proxy/proxymanager_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • proxy/metrics_monitor.go
  • proxy/proxymanager_api.go
  • proxy/proxymanager.go
  • proxy/metrics_middleware.go
🔇 Additional comments (3)
misc/simple-responder/simple-responder.go (2)

51-55: LGTM! Fixed usage metrics for testing.

The hardcoded usage field provides consistent dummy token metrics for testing the new metrics collection functionality. The values align with OpenAI API response format.


82-86: Consistent usage metrics added to completions endpoint.

Good consistency with the chat completions endpoint. The fixed token counts will help validate metrics parsing in tests.

proxy/proxymanager_test.go (1)

687-718: Excellent test coverage for non-streaming metrics collection.

The test properly validates that metrics middleware captures token usage for non-streaming requests. Good integration with the simple-responder's fixed usage values (25 input, 10 output tokens).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
proxy/metrics_middleware.go (1)

14-44: Critical issue: Missing startTime initialization causes incorrect metrics.

The MetricsRecorder.startTime field is not initialized in the middleware setup, which will result in incorrect duration calculations and tokens-per-second metrics. This was flagged in previous reviews and still needs to be addressed.

Apply this fix to properly initialize the startTime:

 writer := &MetricsResponseWriter{
     ResponseWriter: c.Writer,
     metricsRecorder: &MetricsRecorder{
         metricsMonitor: pm.metricsMonitor,
         modelName:      requestedModel, // will be updated in proxyOAIHandler
         isStreaming:    gjson.GetBytes(bodyBytes, "stream").Bool(),
+        startTime:      time.Now(),
     },
 }
🧹 Nitpick comments (1)
proxy/metrics_middleware.go (1)

46-51: Address past feedback: Consider removing unused modelName update comment.

The comment "// will be updated in proxyOAIHandler" on line 34 suggests the modelName might be updated later, but based on the current implementation, the requestedModel from the request body appears to be the correct value to use. If no further updates are needed, consider removing this potentially confusing comment.

-        modelName:      requestedModel, // will be updated in proxyOAIHandler
+        modelName:      requestedModel,
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4288bb and cd9dc5b.

📒 Files selected for processing (2)
  • misc/simple-responder/simple-responder.go (2 hunks)
  • proxy/metrics_middleware.go (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: run-tests
🔇 Additional comments (5)
misc/simple-responder/simple-responder.go (2)

38-106: LGTM: Well-implemented streaming support for testing metrics collection.

The implementation correctly handles both streaming and non-streaming responses with proper SSE formatting for streaming mode. The fixed usage statistics (completion_tokens: 10, prompt_tokens: 25, total_tokens: 35) are consistent across all endpoints and provide reliable test data for the metrics middleware validation.

The streaming logic properly:

  • Sets correct SSE headers
  • Sends incremental token messages as SSE events
  • Includes usage statistics in the final message
  • Terminates with [DONE] marker

131-136: Consistent usage statistics addition enhances test coverage.

The addition of usage statistics to the /v1/completions endpoint maintains consistency with the other endpoints and ensures comprehensive testing of metrics collection across different API endpoints.

proxy/metrics_middleware.go (3)

62-84: Excellent refactoring: Addresses code duplication feedback.

The extraction of metrics parsing logic into a shared parseAndRecordMetrics method successfully eliminates the code duplication identified in previous reviews. This improves maintainability and ensures consistent metrics processing across streaming and non-streaming responses.


86-110: Robust streaming response parsing with proper SSE handling.

The streaming response processing correctly:

  • Splits response by lines and handles SSE "data:" prefix
  • Validates JSON before parsing to prevent panics
  • Properly terminates on "[DONE]" marker
  • Handles empty lines and malformed data gracefully

130-137: Safe Write method implementation with proper error handling.

The Write method correctly captures response data while preserving the original response flow. Error handling ensures that write failures are properly propagated while maintaining the captured data integrity.

@mostlygeek mostlygeek self-requested a review July 22, 2025 03:50
Copy link
Owner

@mostlygeek mostlygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking pretty good. Some tiny things left and I think we can merge this!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd9dc5b and 05509e6.

📒 Files selected for processing (4)
  • misc/simple-responder/simple-responder.go (2 hunks)
  • proxy/metrics_middleware.go (1 hunks)
  • proxy/proxymanager.go (4 hunks)
  • proxy/proxymanager_test.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • misc/simple-responder/simple-responder.go
  • proxy/proxymanager_test.go
  • proxy/proxymanager.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: run-tests
🔇 Additional comments (2)
proxy/metrics_middleware.go (2)

69-91: Good refactoring to eliminate code duplication.

The extraction of parseAndRecordMetrics as a shared method successfully addresses the previous code duplication concerns. The logic correctly calculates token metrics and duration.


130-152: Clean and efficient response writer implementation.

The MetricsResponseWriter properly captures response bytes while maintaining compatibility with the Gin ResponseWriter interface. The approach of buffering everything and processing after completion aligns well with the maintainer's feedback.

@mostlygeek mostlygeek self-requested a review July 22, 2025 05:15
@mostlygeek
Copy link
Owner

Excellent work @g2mt. This looks really good now. Thanks for the big contribution!

@mostlygeek mostlygeek merged commit 87dce5f into mostlygeek:main Jul 22, 2025
3 checks passed
mostlygeek added a commit that referenced this pull request Jul 22, 2025
- use new metrics data instead of log parsing
- auto-start events connection to server, improves responsiveness
- remove unnecessary libraries and code
@mostlygeek
Copy link
Owner

Released in v140

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants