-
Notifications
You must be signed in to change notification settings - Fork 122
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
The Token/Second reported on the Activity page (#195) are incorrect. The way they are calculated are based on the start of the request:
func (rec *MetricsRecorder) parseAndRecordMetrics(jsonData gjson.Result) {
if !jsonData.Get("usage").Exists() {
return
}
outputTokens := int(jsonData.Get("usage.completion_tokens").Int())
inputTokens := int(jsonData.Get("usage.prompt_tokens").Int())
if outputTokens > 0 {
duration := time.Since(rec.startTime)
tokensPerSecond := float64(inputTokens+outputTokens) / duration.Seconds()
metrics := TokenMetrics{
Timestamp: time.Now(),
Model: rec.realModelName,
InputTokens: inputTokens,
OutputTokens: outputTokens,
TokensPerSecond: tokensPerSecond,
DurationMs: int(duration.Milliseconds()),
}
rec.metricsMonitor.addMetrics(metrics)
}
}Expected behaviour
- The tokens/second accurately matches the results of the upstream server
- Better to have no data than totally inaccurate data
Changes
- look for llama.cpp's
timingsJSON in response and extract information from there - do not try to calculate tokens/second and use
-1when the data is not available
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working