Skip to content

fix(provider) prevent audio as image url#1106

Merged
alexhoshina merged 1 commit intosipeed:mainfrom
afjcjsbx:fix/prevent-audio-as-image-url
Mar 5, 2026
Merged

fix(provider) prevent audio as image url#1106
alexhoshina merged 1 commit intosipeed:mainfrom
afjcjsbx:fix/prevent-audio-as-image-url

Conversation

@afjcjsbx
Copy link
Copy Markdown
Collaborator

@afjcjsbx afjcjsbx commented Mar 4, 2026

📝 Description

This PR fixes a critical bug where transcribed voice messages were causing text-only LLMs (like Zhipu glm-4.7-flash) to crash with a 404 error.

Previously, after the transcriber processed an audio file, the audio reference remained in the msg.Media array. The openai_compat provider then blindly converted all media files into image_url blocks. When sending an image_url payload to a text-only model, the API rejected the request entirely.

Changes made:

  • Updated pkg/providers/openai_compat/provider.go (serializeMessages) to strictly filter media types. It now only appends image_url blocks for actual images (strings.HasPrefix(mediaURL, "data:image/")). Transcribed audio is safely ignored at the serialization layer since its content is already injected as text.

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

📚 Technical Context (Skip for Docs)

  • Reference URL: N/A
  • Reasoning:

🧪 Test Environment

  • Hardware: PC
  • OS: macOS
  • Model/Provider: Zhipu / glm-4.7-flash
  • Channels: Telegram

📸 Evidence (Optional)

Click to view Logs/Screenshots

Before the fix (Error Log):

2026/03/04 19:03:49 [DEBUG] voice: Sending transcription request to Groq API
2026/03/04 19:03:50 [INFO] voice: Transcription completed successfully {text_length=4, transcription_preview= Hi.}
...
2026/03/04 19:03:50 [ERROR] agent: LLM call failed {iteration=1, error=API request failed:
  Status: 404
  Body:   {"error":{"message":"No endpoints found that support image input","code":404}}, agent_id=main}

@alexhoshina alexhoshina merged commit 464ae18 into sipeed:main Mar 5, 2026
2 checks passed
hyperwd pushed a commit to hyperwd/picoclaw that referenced this pull request Mar 5, 2026
…ge-url

fix(provider) prevent audio as image url
fishtrees pushed a commit to fishtrees/picoclaw that referenced this pull request Mar 12, 2026
…ge-url

fix(provider) prevent audio as image url
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants