Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions vllm_omni/entrypoints/openai/serving_chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -1324,11 +1324,21 @@ async def chat_completion_full_generator(
prompt_token_ids = None
kv_transfer_params = None

# Build requested modalities set for filtering
requested_modalities = (
set(request.modalities) if hasattr(request, "modalities") and request.modalities else None
)

for omni_outputs in final_outputs:
choices_data = []
if omni_outputs.request_output is not None and not getattr(omni_outputs.request_output, "finished", False):
continue

# Filter outputs based on requested modalites
if requested_modalities is not None and omni_outputs.final_output_type not in requested_modalities:
logger.warning(f"final output type: {omni_outputs.final_output_type} is not needed by the request")
continue
Comment on lines +1337 to +1340
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve usage when filtering modalities

For non‑streaming requests that set modalities to exclude text (e.g., [“audio”]), this filter skips the text omni_outputs, which is the only path that populates usage/prompt_token_ids/kv_transfer_params. The response then returns zero usage even though tokens were consumed, while streaming still reports prompt tokens, so clients that depend on usage will see a regression. Consider deriving usage from omni_outputs.request_output even when text output is filtered.

Useful? React with 👍 / 👎.


if omni_outputs.final_output_type == "text":
(
choices_data,
Expand Down