feat: enable streaming usage metrics for OpenAI-compatible providers #4326

skamenan7 · 2025-12-05T15:33:25Z

Inject stream_options={"include_usage": True} when streaming and OpenTelemetry telemetry is active. Telemetry always overrides any caller preference to ensure complete and consistent observability metrics.

Changes:

Add conditional stream_options injection to OpenAIMixin (benefits OpenAI, Bedrock, Runpod, Together, Fireworks providers)
Add conditional stream_options injection to LiteLLMOpenAIMixin (benefits WatsonX and other litellm-based providers)
Check telemetry status using trace.get_current_span().is_recording()
Override include_usage=False when telemetry active to prevent metric gaps
Unit tests for this functionality

Fixes #3981

Note: this work originated in PR #4200, which I closed after rebasing on the telemetry changes. This PR rebases those commits, incorporates the Bedrock feedback, and carries forward the same scope described there.

Test Plan

OpenAIMixin + telemetry injection tests

PYTHONPATH=src python -m pytest tests/unit/providers/utils/inference/test_openai_mixin.py

LiteLLM OpenAIMixin tests

PYTHONPATH=src python -m pytest tests/unit/providers/inference/test_litellm_openai_mixin.py -v

Broader inference provider

PYTHONPATH=src python -m pytest tests/unit/providers/inference/ --ignore=tests/unit/providers/inference/test_inference_client_caching.py -v

skamenan7 · 2025-12-09T13:43:48Z

Hi @mattf , I created this PR from PR# 4200 which was approved before Telemetry changes. Please take a look of this one. Thanks!

skamenan7 · 2025-12-10T14:04:40Z

@leseb please take a look. thanks!

mattf

looking good.

the param mods are going into the litellm mixin and watsonx adapter, but the watsonx adapter uses the litellm mixin. can we avoid this duplication?

skamenan7 · 2025-12-10T18:59:58Z

Thanks @mattf, it looks much better now with no duplication. Please take a look of this refactor.

mattf

it looks like the llm is thrashing on this.

bedrock isa openaimixin, openaimixin does the include_usage, why does bedrock duplicate the logic?
same for runpod
same for watson, but with litellm mixin

mergify · 2025-12-15T13:35:25Z

This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mattf

looking good, just get the tests green

skamenan7 · 2025-12-15T15:32:43Z

Thank you @mattf. I was about to notify you :)

skamenan7 · 2025-12-16T14:26:58Z

PTAL @leseb and @cdoern

ashwinb · 2025-12-16T18:50:14Z

@skamenan7 the integration tests are still red -- please fix before requesting reviews?

Inject `stream_options={"include_usage": True}` when streaming and OpenTelemetry telemetry is active. Telemetry always overrides any caller preference to ensure complete and consistent observability metrics. Changes: - Add `get_stream_options_for_telemetry()` utility in openai_compat.py - Integrate telemetry-driven stream_options injection in OpenAIMixin (benefits OpenAI, Bedrock, Runpod, vLLM, TGI, and 12+ other providers) - Integrate telemetry-driven stream_options injection in LiteLLMOpenAIMixin (benefits WatsonX and other LiteLLM-based providers) - Add `_litellm_extra_request_params()` hook for provider-specific params - Remove duplicated stream_options logic from Bedrock, Runpod, WatsonX - Comprehensive unit tests for injection behavior Fixes llamastack#3981

Added supports_stream_options capability flag (default: True) to prevent injecting stream_options parameter for providers that don't support it. Changes: - OpenAIMixin: Added supports_stream_options attribute (default: True) - LiteLLMOpenAIMixin: Added supports_stream_options parameter to __init__ - get_stream_options_for_telemetry(): Added supports_stream_options check - OllamaInferenceAdapter: Set supports_stream_options=False - VLLMInferenceAdapter: Set supports_stream_options=False Tests: - Added 2 tests for providers that don't support stream_options (chat completion + completion for symmetry) - Verified locally with Ollama/vLLM adapters This fixes CI failures in Docker integration tests where stream_options was being injected for Ollama and vLLM, which don't support the parameter.

skamenan7 · 2025-12-19T15:02:04Z

My bad, CI is green now. Thanks @ashwinb

mattf

for clarity i'd check params.stream and self.supports_stream_options before calling get_stream_options_for_telemetry, but this is fine

skamenan7 · 2025-12-19T18:30:16Z

Thank you @mattf I will keep as this for now as it is fine.

…lamastack#4326) Inject `stream_options={"include_usage": True} `when streaming and OpenTelemetry telemetry is active. Telemetry always overrides any caller preference to ensure complete and consistent observability metrics. Changes: - Add conditional stream_options injection to OpenAIMixin (benefits OpenAI, Bedrock, Runpod, Together, Fireworks providers) - Add conditional stream_options injection to LiteLLMOpenAIMixin (benefits WatsonX and other litellm-based providers) - Check telemetry status using trace.get_current_span().is_recording() - Override include_usage=False when telemetry active to prevent metric gaps - Unit tests for this functionality Fixes llamastack#3981 Note: this work originated in PR llamastack#4200, which I closed after rebasing on the telemetry changes. This PR rebases those commits, incorporates the Bedrock feedback, and carries forward the same scope described there. ## Test Plan #### OpenAIMixin + telemetry injection tests PYTHONPATH=src python -m pytest tests/unit/providers/utils/inference/test_openai_mixin.py #### LiteLLM OpenAIMixin tests PYTHONPATH=src python -m pytest tests/unit/providers/inference/test_litellm_openai_mixin.py -v #### Broader inference provider PYTHONPATH=src python -m pytest tests/unit/providers/inference/ --ignore=tests/unit/providers/inference/test_inference_client_caching.py -v

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 5, 2025

skamenan7 mentioned this pull request Dec 5, 2025

feat: enable streaming usage metrics for OpenAI-compatible providers #4200

Closed

skamenan7 changed the title ~~Enable streaming usage metrics for OpenAI-compatible providers~~ feat: enable streaming usage metrics for OpenAI-compatible providers Dec 5, 2025

skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch 3 times, most recently from f56437a to 25515b6 Compare December 9, 2025 13:43

skamenan7 marked this pull request as ready for review December 9, 2025 13:43

skamenan7 requested review from ashwinb, bbrowning, cdoern, ehhuang, franciscojavierarceo, leseb, mattf and raghotham as code owners December 9, 2025 13:43

skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch from 25515b6 to 3c3eb42 Compare December 10, 2025 14:03

skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch from 3c3eb42 to 7c05120 Compare December 10, 2025 14:04

mattf reviewed Dec 10, 2025

View reviewed changes

skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch 2 times, most recently from ad9d7b8 to c96e92e Compare December 10, 2025 18:54

skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch 3 times, most recently from 45ea268 to a2f74f5 Compare December 12, 2025 14:04

mattf reviewed Dec 12, 2025

View reviewed changes

skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch 3 times, most recently from 2e60e7f to 6543098 Compare December 15, 2025 13:34

mergify bot added the needs-rebase label Dec 15, 2025

skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch from 6543098 to 3987c6e Compare December 15, 2025 13:43

mergify bot removed the needs-rebase label Dec 15, 2025

mattf approved these changes Dec 15, 2025

View reviewed changes

skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch from 3987c6e to e9b5acf Compare December 15, 2025 20:25

skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch 3 times, most recently from 91fd28f to a4319c3 Compare December 19, 2025 14:51

skamenan7 added 2 commits December 19, 2025 09:51

skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch from a4319c3 to ba48982 Compare December 19, 2025 14:51

skamenan7 requested a review from mattf December 19, 2025 15:03

mattf approved these changes Dec 19, 2025

View reviewed changes

skamenan7 mentioned this pull request Dec 19, 2025

feat(ci): add Bedrock integration tests with record/replay #4292

Open

ashwinb merged commit bd35aa4 into llamastack:main Dec 19, 2025
34 checks passed

feat: enable streaming usage metrics for OpenAI-compatible providers #4326

feat: enable streaming usage metrics for OpenAI-compatible providers #4326

Uh oh!

Conversation

skamenan7 commented Dec 5, 2025

Test Plan

OpenAIMixin + telemetry injection tests

LiteLLM OpenAIMixin tests

Broader inference provider

Uh oh!

skamenan7 commented Dec 9, 2025

Uh oh!

skamenan7 commented Dec 10, 2025

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

skamenan7 commented Dec 10, 2025

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Dec 15, 2025

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

skamenan7 commented Dec 15, 2025

Uh oh!

skamenan7 commented Dec 16, 2025

Uh oh!

ashwinb commented Dec 16, 2025

Uh oh!

skamenan7 commented Dec 19, 2025

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

skamenan7 commented Dec 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants