Skip to content

RTVIObserver drops upstream-only TranscriptionFrame from Gemini Live (and other realtime LLMs) #3749

@acrdlph

Description

@acrdlph

Environment

  • pipecat version: 0.0.102 (and current main at commit c682a44)
  • Python version: 3.13.5
  • Operating System: macOS 15.4 (Darwin 25.2.0)

Issue description

Commit 5f64dae0c ("Filter RTVIObserver to downstream frames only") re-added a direction != FrameDirection.DOWNSTREAM: return guard to RTVIObserver.on_push_frame(). This was intended to prevent duplicate RTVI messages for frames that are broadcast in both directions (e.g. UserStartedSpeakingFrame, FunctionCallResultFrame).

However, GeminiLiveLLMService pushes TranscriptionFrame only upstream (FrameDirection.UPSTREAM) — it is never broadcast downstream. As a result, the RTVIObserver silently drops every user transcription, and RTVI clients never receive user-transcription messages.

The same issue likely affects OpenAIRealtimeLLMService and any other realtime/multimodal LLM service that pushes transcription frames upstream only.

Relevant code:

  • RTVIObserver.on_push_frame direction guard: src/pipecat/processors/frameworks/rtvi.py (line ~1226)
  • GeminiLiveLLMService._push_user_transcription pushing upstream: src/pipecat/services/google/gemini_live/llm.py (line ~1615)
# RTVIObserver — drops ALL upstream frames including TranscriptionFrame
if direction != FrameDirection.DOWNSTREAM:
    return
# GeminiLiveLLMService — pushes TranscriptionFrame upstream only
await self.push_frame(
    TranscriptionFrame(text=text, user_id="", timestamp=time_now_iso8601(), result=result),
    FrameDirection.UPSTREAM,
)

Reproduction steps

  1. Create a bot using GeminiLiveLLMService with any pipeline (e.g. the 26b-gemini-live-function-calling.py example)
  2. Connect an RTVI client (e.g. @pipecat-ai/client-react)
  3. Speak to the bot
  4. Observe that the bot responds with audio, but the RTVI client never receives user-transcription messages

Expected behavior

RTVI clients should receive user-transcription messages containing the user's speech transcribed by the Gemini Live API (via input_audio_transcription).

This worked correctly before commit 5f64dae0c when the direction filter was not present in RTVIObserver.

Actual behavior

RTVIObserver.on_push_frame() returns early for all upstream frames. Since TranscriptionFrame from Gemini Live is only pushed upstream, it is never processed, and no user-transcription RTVI message is ever sent to the client.

The Gemini Live service does receive and process transcriptions from the API (visible in logs as [Transcription:user] [...] at DEBUG level), but the RTVIObserver discards them before they can be forwarded.

Logs

# Transcription IS received from Gemini API:
2026-02-14 19:06:28.685 | DEBUG | pipecat.services.google.gemini_live.llm:_handle_msg_input_transcription:1683 - [Transcription:user] [maybe problem solve neighbor yeah.]

# But NO corresponding RTVI user-transcription message is ever sent to the client

Suggested fix

Either:

  1. Broadcast TranscriptionFrame in both directions from GeminiLiveLLMService (and other realtime LLM services), so the downstream copy reaches RTVIObserver — OR
  2. Allowlist TranscriptionFrame and InterimTranscriptionFrame in the RTVIObserver's direction check, so upstream-only transcription frames are still processed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions