bugfix by aprameyak · Pull Request #14817 · Kong/kong

aprameyak · 2026-01-16T05:01:14Z

Summary

Fix the AI Proxy plugin so that llm_total_tokens_count Prometheus metric respects explicit total_tokens values returned by LLM providers.
Previously, the metric was incorrectly calculated as prompt_tokens + completion_tokens, which underreported token usage for models using reasoning tokens.

This fix:

Emits llm_total_tokens_count in the driver when response_object.usage.total_tokens exists.
Updates observability logic to prefer the explicit total, falling back to prompt + completion for backward compatibility.

Checklist

The Pull Request has tests (or prepared to add in a follow-up PR if required)
Changelog updated or skip-changelog added
User-facing docs PR linked (if needed)

Issue reference

Fix #14816

Verification / QA

Explicit total_tokens now correctly reported in Prometheus metric
Fallback calculation works when total_tokens is missing
No recursion occurs in _M.metrics_get
Existing tests pass, no linting errors
Backward compatible for providers without total_tokens

CLAassistant · 2026-01-16T05:01:20Z

All committers have signed the CLA.

CLAassistant · 2026-01-16T05:01:21Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

spacewander

According to the OpenAI doc: https://platform.openai.com/docs/api-reference/chat/object#chat-object-usage-total_tokens

total = prompt + completion

reasoning_tokens is counted as completion_tokens

git-hulk · 2026-01-26T02:51:36Z

According to the OpenAI doc: https://platform.openai.com/docs/api-reference/chat/object#chat-object-usage-total_tokens

total = prompt + completion

reasoning_tokens is counted as completion_tokens

This is correct for OpenAI/Anthropic. The Gemini API did count the thoughts in the total, but thoughts and too use prompt tokens are NOT included in the candidates(completion) token.

spacewander · 2026-01-26T07:03:22Z

kong/llm/drivers/shared.lua

    end
    if response_object.usage.total_tokens then
      request_analytics_plugin[log_entry_keys.USAGE_CONTAINER][log_entry_keys.TOTAL_TOKENS] = response_object.usage.total_tokens
+      ai_plugin_o11y.metrics_set("llm_total_tokens_count", response_object.usage.total_tokens)


Let's update normalize-sse-chunk.lua and parse-json-response.lua too.

spacewander · 2026-01-26T07:03:58Z

kong/llm/plugin/observability.lua



-function _M.metrics_get(key)
+function _M.metrics_get(key, skip_calculation)


We can keep it simple - no need to add skip_calculation, just skip if the key exists

spacewander · 2026-01-26T07:04:24Z

@git-hulk
Yes, I checked it, and you are right.

git-hulk · 2026-01-26T07:52:51Z

@git-hulk Yes, I checked it, and you are right.

@spacewander Thanks for your prompt and kind reply. For this fix, I'm wondering if it would be better to 'correct' the completion token count if the total count exists.

spacewander · 2026-01-28T06:52:32Z

@git-hulk Yes, I checked it, and you are right.

@spacewander Thanks for your prompt and kind reply. For this fix, I'm wondering if it would be better to 'correct' the completion token count if the total count exists.

@fffonion
What do you think?

Pro: this behavior follows the OpenAI one: completion token count is the number of response tokens.
Con: the result doesn't match the usage given by the gemini. People may think we are doing wrong when checking the bill.

git-hulk · 2026-01-28T07:27:50Z

@git-hulk Yes, I checked it, and you are right.

@spacewander Thanks for your prompt and kind reply. For this fix, I'm wondering if it would be better to 'correct' the completion token count if the total count exists.

@fffonion What do you think?

Pro: this behavior follows the OpenAI one: completion token count is the number of response tokens. Con: the result doesn't match the usage given by the gemini. People may think we are doing wrong when checking the bill.

I think this fix is not quite correct. We should correct the candidates' token count by adding the thinking token count to it. Another way is to add the thoughts token count field for usage, but I feel it's not a good approach because it's a specific behavior in Gemini.

And counting the thoughts/tool_use token count as the candidates (completions) token count won't cause the billing usage issue, since the reasoning token shares the same price as the candidates tokens, see [1]. Instead, Kong no longer counts the thoughts/tool_use token part, which might confuse users because the billing usage would be higher than the tokens were recorded on the Kong side.

[1] https://cloud.google.com/vertex-ai/generative-ai/pricing

fffonion · 2026-01-29T07:05:56Z

Let's ignore Kong's behaviour for now. If I understand correctly, currently if user take the total token count from gemini, multiply by the token price, the number the user get is not same as Google actually charges. And we are trying to fix this behaviour right?

git-hulk · 2026-01-29T07:36:10Z

Let's ignore Kong's behaviour for now. If I understand correctly, currently if user take the total token count from gemini, multiply by the token price, the number the user get is not same as Google actually charges. And we are trying to fix this behaviour right?

@fffonion, the price of prompt/completion tokens differs, so we cannot simply multiply the total count by a single price. From my side, the main issue is that the thoughts token count wasn't counted into the completion token count in Gemini. So, the cost from Kong will be lower than the actual billing usage.

@aprameyak I'm not sure if you're suffering the same issue.

aprameyak · 2026-01-30T02:05:53Z

This PR is intentionally scoped to the original issue (#14816), which is that llm_total_tokens_count ignored the explicit usage.total_tokens value and instead recomputed it as prompt + completion.

Normalizing completion token semantics across providers (e.g. Gemini thoughts/tool-use vs candidates) is a related but separate concern and wasn’t part of the issue being addressed here. I think that’s worth discussing separately if we want to change how completion tokens are defined.

aprameyak · 2026-02-08T04:30:02Z

@fffonion, @spacewander, @git-hulk I wanted to ask if there is anything else I should do for the PR. I am not clear on whether there should be changes at the moment

bugfix

31ac6e6

pull-request-size bot added the size/S label Jan 16, 2026

ms2008 requested review from dylankyc, oowl and spacewander January 19, 2026 03:35

spacewander reviewed Jan 20, 2026

View reviewed changes

Merge branch 'master' into master

f6db4ec

spacewander reviewed Jan 26, 2026

View reviewed changes



		function _M.metrics_get(key)
		function _M.metrics_get(key, skip_calculation)

Conversation

aprameyak commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Issue reference

Verification / QA

Uh oh!

CLAassistant commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Jan 16, 2026

Uh oh!

spacewander left a comment

Choose a reason for hiding this comment

Uh oh!

git-hulk commented Jan 26, 2026

Uh oh!

spacewander Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

spacewander Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

spacewander commented Jan 26, 2026

Uh oh!

git-hulk commented Jan 26, 2026

Uh oh!

spacewander commented Jan 28, 2026

Uh oh!

git-hulk commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fffonion commented Jan 29, 2026

Uh oh!

git-hulk commented Jan 29, 2026

Uh oh!

aprameyak commented Jan 30, 2026

Uh oh!

aprameyak commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

aprameyak commented Jan 16, 2026 •

edited

Loading

CLAassistant commented Jan 16, 2026 •

edited

Loading

git-hulk commented Jan 28, 2026 •

edited

Loading