-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: enable streaming usage metrics for OpenAI-compatible providers #4326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable streaming usage metrics for OpenAI-compatible providers #4326
Conversation
f56437a to
25515b6
Compare
25515b6 to
3c3eb42
Compare
|
@leseb please take a look. thanks! |
3c3eb42 to
7c05120
Compare
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good.
the param mods are going into the litellm mixin and watsonx adapter, but the watsonx adapter uses the litellm mixin. can we avoid this duplication?
ad9d7b8 to
c96e92e
Compare
|
Thanks @mattf, it looks much better now with no duplication. Please take a look of this refactor. |
45ea268 to
a2f74f5
Compare
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like the llm is thrashing on this.
- bedrock isa openaimixin, openaimixin does the include_usage, why does bedrock duplicate the logic?
- same for runpod
- same for watson, but with litellm mixin
2e60e7f to
6543098
Compare
|
This pull request has merge conflicts that must be resolved before it can be merged. @skamenan7 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
6543098 to
3987c6e
Compare
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good, just get the tests green
|
Thank you @mattf. I was about to notify you :) |
3987c6e to
e9b5acf
Compare
|
@skamenan7 the integration tests are still red -- please fix before requesting reviews? |
91fd28f to
a4319c3
Compare
Inject `stream_options={"include_usage": True}` when streaming and
OpenTelemetry telemetry is active. Telemetry always overrides any caller
preference to ensure complete and consistent observability metrics.
Changes:
- Add `get_stream_options_for_telemetry()` utility in openai_compat.py
- Integrate telemetry-driven stream_options injection in OpenAIMixin
(benefits OpenAI, Bedrock, Runpod, vLLM, TGI, and 12+ other providers)
- Integrate telemetry-driven stream_options injection in LiteLLMOpenAIMixin
(benefits WatsonX and other LiteLLM-based providers)
- Add `_litellm_extra_request_params()` hook for provider-specific params
- Remove duplicated stream_options logic from Bedrock, Runpod, WatsonX
- Comprehensive unit tests for injection behavior
Fixes llamastack#3981
Added supports_stream_options capability flag (default: True) to prevent injecting stream_options parameter for providers that don't support it. Changes: - OpenAIMixin: Added supports_stream_options attribute (default: True) - LiteLLMOpenAIMixin: Added supports_stream_options parameter to __init__ - get_stream_options_for_telemetry(): Added supports_stream_options check - OllamaInferenceAdapter: Set supports_stream_options=False - VLLMInferenceAdapter: Set supports_stream_options=False Tests: - Added 2 tests for providers that don't support stream_options (chat completion + completion for symmetry) - Verified locally with Ollama/vLLM adapters This fixes CI failures in Docker integration tests where stream_options was being injected for Ollama and vLLM, which don't support the parameter.
a4319c3 to
ba48982
Compare
|
My bad, CI is green now. Thanks @ashwinb |
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for clarity i'd check params.stream and self.supports_stream_options before calling get_stream_options_for_telemetry, but this is fine
|
Thank you @mattf I will keep as this for now as it is fine. |
…lamastack#4326) Inject `stream_options={"include_usage": True} `when streaming and OpenTelemetry telemetry is active. Telemetry always overrides any caller preference to ensure complete and consistent observability metrics. Changes: - Add conditional stream_options injection to OpenAIMixin (benefits OpenAI, Bedrock, Runpod, Together, Fireworks providers) - Add conditional stream_options injection to LiteLLMOpenAIMixin (benefits WatsonX and other litellm-based providers) - Check telemetry status using trace.get_current_span().is_recording() - Override include_usage=False when telemetry active to prevent metric gaps - Unit tests for this functionality Fixes llamastack#3981 Note: this work originated in PR llamastack#4200, which I closed after rebasing on the telemetry changes. This PR rebases those commits, incorporates the Bedrock feedback, and carries forward the same scope described there. ## Test Plan #### OpenAIMixin + telemetry injection tests PYTHONPATH=src python -m pytest tests/unit/providers/utils/inference/test_openai_mixin.py #### LiteLLM OpenAIMixin tests PYTHONPATH=src python -m pytest tests/unit/providers/inference/test_litellm_openai_mixin.py -v #### Broader inference provider PYTHONPATH=src python -m pytest tests/unit/providers/inference/ --ignore=tests/unit/providers/inference/test_inference_client_caching.py -v

Inject
stream_options={"include_usage": True}when streaming and OpenTelemetry telemetry is active. Telemetry always overrides any caller preference to ensure complete and consistent observability metrics.Changes:
Fixes #3981
Note: this work originated in PR #4200, which I closed after rebasing on the telemetry changes. This PR rebases those commits, incorporates the Bedrock feedback, and carries forward the same scope described there.
Test Plan
OpenAIMixin + telemetry injection tests
PYTHONPATH=src python -m pytest tests/unit/providers/utils/inference/test_openai_mixin.py
LiteLLM OpenAIMixin tests
PYTHONPATH=src python -m pytest tests/unit/providers/inference/test_litellm_openai_mixin.py -v
Broader inference provider
PYTHONPATH=src python -m pytest tests/unit/providers/inference/ --ignore=tests/unit/providers/inference/test_inference_client_caching.py -v