Skip to content

Conversation

@skamenan7
Copy link
Contributor

Inject stream_options={"include_usage": True} when streaming and OpenTelemetry telemetry is active. Telemetry always overrides any caller preference to ensure complete and consistent observability metrics.

Changes:

  • Add conditional stream_options injection to OpenAIMixin (benefits OpenAI, Bedrock, Runpod, Together, Fireworks providers)
  • Add conditional stream_options injection to LiteLLMOpenAIMixin (benefits WatsonX and other litellm-based providers)
  • Check telemetry status using trace.get_current_span().is_recording()
  • Override include_usage=False when telemetry active to prevent metric gaps
  • Unit tests for this functionality

Fixes #3981

Note: this work originated in PR #4200, which I closed after rebasing on the telemetry changes. This PR rebases those commits, incorporates the Bedrock feedback, and carries forward the same scope described there.

Test Plan

OpenAIMixin + telemetry injection tests

PYTHONPATH=src python -m pytest tests/unit/providers/utils/inference/test_openai_mixin.py

LiteLLM OpenAIMixin tests

PYTHONPATH=src python -m pytest tests/unit/providers/inference/test_litellm_openai_mixin.py -v

Broader inference provider

PYTHONPATH=src python -m pytest tests/unit/providers/inference/ --ignore=tests/unit/providers/inference/test_inference_client_caching.py -v

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 5, 2025
@skamenan7 skamenan7 changed the title Enable streaming usage metrics for OpenAI-compatible providers feat: enable streaming usage metrics for OpenAI-compatible providers Dec 5, 2025
@skamenan7 skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch 3 times, most recently from f56437a to 25515b6 Compare December 9, 2025 13:43
@skamenan7
Copy link
Contributor Author

Hi @mattf , I created this PR from PR# 4200 which was approved before Telemetry changes. Please take a look of this one. Thanks!

@skamenan7 skamenan7 marked this pull request as ready for review December 9, 2025 13:43
@skamenan7 skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch from 25515b6 to 3c3eb42 Compare December 10, 2025 14:03
@skamenan7
Copy link
Contributor Author

@leseb please take a look. thanks!

@skamenan7 skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch from 3c3eb42 to 7c05120 Compare December 10, 2025 14:04
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good.

the param mods are going into the litellm mixin and watsonx adapter, but the watsonx adapter uses the litellm mixin. can we avoid this duplication?

@skamenan7 skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch 2 times, most recently from ad9d7b8 to c96e92e Compare December 10, 2025 18:54
@skamenan7
Copy link
Contributor Author

Thanks @mattf, it looks much better now with no duplication. Please take a look of this refactor.

@skamenan7 skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch 3 times, most recently from 45ea268 to a2f74f5 Compare December 12, 2025 14:04
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the llm is thrashing on this.

  • bedrock isa openaimixin, openaimixin does the include_usage, why does bedrock duplicate the logic?
  • same for runpod
  • same for watson, but with litellm mixin

@skamenan7 skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch 3 times, most recently from 2e60e7f to 6543098 Compare December 15, 2025 13:34
@mergify
Copy link

mergify bot commented Dec 15, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 15, 2025
@skamenan7 skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch from 6543098 to 3987c6e Compare December 15, 2025 13:43
@mergify mergify bot removed the needs-rebase label Dec 15, 2025
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, just get the tests green

@skamenan7
Copy link
Contributor Author

Thank you @mattf. I was about to notify you :)

@skamenan7 skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch from 3987c6e to e9b5acf Compare December 15, 2025 20:25
@skamenan7
Copy link
Contributor Author

PTAL @leseb and @cdoern

@ashwinb
Copy link
Contributor

ashwinb commented Dec 16, 2025

@skamenan7 the integration tests are still red -- please fix before requesting reviews?

@skamenan7 skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch 3 times, most recently from 91fd28f to a4319c3 Compare December 19, 2025 14:51
Inject `stream_options={"include_usage": True}` when streaming and
OpenTelemetry telemetry is active. Telemetry always overrides any caller
preference to ensure complete and consistent observability metrics.

Changes:
- Add `get_stream_options_for_telemetry()` utility in openai_compat.py
- Integrate telemetry-driven stream_options injection in OpenAIMixin
  (benefits OpenAI, Bedrock, Runpod, vLLM, TGI, and 12+ other providers)
- Integrate telemetry-driven stream_options injection in LiteLLMOpenAIMixin
  (benefits WatsonX and other LiteLLM-based providers)
- Add `_litellm_extra_request_params()` hook for provider-specific params
- Remove duplicated stream_options logic from Bedrock, Runpod, WatsonX
- Comprehensive unit tests for injection behavior

Fixes llamastack#3981
Added supports_stream_options capability flag (default: True) to prevent
injecting stream_options parameter for providers that don't support it.

Changes:
- OpenAIMixin: Added supports_stream_options attribute (default: True)
- LiteLLMOpenAIMixin: Added supports_stream_options parameter to __init__
- get_stream_options_for_telemetry(): Added supports_stream_options check
- OllamaInferenceAdapter: Set supports_stream_options=False
- VLLMInferenceAdapter: Set supports_stream_options=False

Tests:
- Added 2 tests for providers that don't support stream_options
  (chat completion + completion for symmetry)
- Verified locally with Ollama/vLLM adapters

This fixes CI failures in Docker integration tests where stream_options
was being injected for Ollama and vLLM, which don't support the parameter.
@skamenan7 skamenan7 force-pushed the feat/3981-enable-streaming-usage-metrics-v2 branch from a4319c3 to ba48982 Compare December 19, 2025 14:51
@skamenan7
Copy link
Contributor Author

My bad, CI is green now. Thanks @ashwinb
image

@skamenan7 skamenan7 requested a review from mattf December 19, 2025 15:03
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for clarity i'd check params.stream and self.supports_stream_options before calling get_stream_options_for_telemetry, but this is fine

@skamenan7
Copy link
Contributor Author

Thank you @mattf I will keep as this for now as it is fine.

@ashwinb ashwinb merged commit bd35aa4 into llamastack:main Dec 19, 2025
34 checks passed
r-bit-rry pushed a commit to r-bit-rry/llama-stack that referenced this pull request Dec 23, 2025
…lamastack#4326)

Inject `stream_options={"include_usage": True} `when streaming and
OpenTelemetry telemetry is active. Telemetry always overrides any caller
preference to ensure complete and consistent observability metrics.

Changes:
- Add conditional stream_options injection to OpenAIMixin (benefits
OpenAI, Bedrock, Runpod, Together, Fireworks providers)
- Add conditional stream_options injection to LiteLLMOpenAIMixin
(benefits WatsonX and other litellm-based providers)
- Check telemetry status using trace.get_current_span().is_recording()
- Override include_usage=False when telemetry active to prevent metric
gaps
- Unit tests for this functionality

Fixes llamastack#3981

Note: this work originated in PR llamastack#4200, which I closed after rebasing on
the telemetry changes. This PR rebases those commits, incorporates the
Bedrock feedback, and carries forward the same scope described there.
## Test Plan
#### OpenAIMixin + telemetry injection tests 
PYTHONPATH=src python -m pytest
tests/unit/providers/utils/inference/test_openai_mixin.py

#### LiteLLM OpenAIMixin tests
PYTHONPATH=src python -m pytest
tests/unit/providers/inference/test_litellm_openai_mixin.py -v

#### Broader inference provider
PYTHONPATH=src python -m pytest tests/unit/providers/inference/
--ignore=tests/unit/providers/inference/test_inference_client_caching.py
-v
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable streaming usage metrics in OpenAIMixin for all OpenAI-compatible providers

3 participants