feat(channel): echo voice audio transcription feedback by afjcjsbx · Pull Request #1214 · sipeed/picoclaw

afjcjsbx · 2026-03-07T14:58:46Z

📝 Description

This PR introduces a major UX improvement for voice message handling by echoing the transcription back to the user before the agent starts processing the response.

New Workflow for Voice Messages:

The user sends a voice message.
The agent transcribes the audio in the background.
The agent sends a new message (as a direct reply to the user's voice message) containing the transcribed text. If no voice is detected, it sends a specific alert.
Immediately after, the agent triggers the "Thinking... 💭" placeholder.
Once the LLM finishes, the placeholder is edited and replaced by the final response.

To achieve a natural conversational flow, I had to refactor when the "Thinking... 💭" placeholder is generated. Previously, the placeholder was triggered automatically as soon as the base channel received the message. Because audio transcription takes a few seconds, the transcription echo was being sent after the placeholder had already been created.
Since the placeholder is eventually edited to become the final LLM response, the old logic resulted in an inverted and confusing chat history:
❌ [User Audio] -> [Final Agent Response] -> [Transcript Echo]

By decoupling the placeholder generation from the channel's HandleMessage and moving it into the AgentLoop (triggered explicitly after the transcription and right before the LLM call), the UX is now perfectly sequential:
✅ [User Audio] -> [Transcript Echo] -> [Thinking... / Final Agent Response]

Key Changes:

Configuration: Added voice.echo_transcription toggle in config.json.
Bus System: Extended bus.OutboundMessage with ReplyToMessageID, SkipPlaceholder, and TriggerPlaceholder to give the agent more granular control over message delivery.
Channel Manager: - Disabled automatic placeholder creation upon message receipt in channels/base.go.
- Updated channels/manager.go to intercept TriggerPlaceholder and spawn the "Thinking" message on demand.
- Updated preSend to respect SkipPlaceholder, ensuring the transcription feedback creates a new message rather than consuming the placeholder.
Agent Loop: Extracted the transcription feedback logic into a clean sendTranscriptionFeedback method and moved the placeholder trigger right before the LLM iteration.
Telegram Channel: Implemented native support for ReplyToMessageID using telego.ReplyParameters.

🗣️ Type of Change

🐞 Bug fix (non-breaking change which fixes an issue)
✨ New feature (non-breaking change which adds functionality)
📖 Documentation update
⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

🤖 Fully AI-generated (100% AI, 0% Human)
🛠️ Mostly AI-generated (AI draft, Human verified/modified)
👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

📚 Technical Context (Skip for Docs)

Reference URL:
Reasoning: Previously, there was a chronological race condition where the "Thinking..." placeholder was spawned immediately, and the transcription text either overwrote it or appeared out of order. By decoupling the placeholder generation from the message receipt phase and moving it to the LLM pre-execution phase, we achieve a perfectly sequential and intuitive UX for voice interactions.

🧪 Test Environment

Hardware: MacBook M1
OS: macOS
Model/Provider: All
Channels: All Telegram

📸 Evidence (Optional)

Click to view Logs/Screenshots

Example of the new interaction flow:

Telegram:

Discord:

Slack:

☑️ Checklist

My code/docs follow the style of this project.
I have performed a self-review of my own changes.
I have updated the documentation accordingly.

alexhoshina · 2026-03-07T15:02:30Z

make lint plz.
If convenient, could you also test the impact on other channels?

afjcjsbx · 2026-03-07T15:26:16Z

I tested telegram and discord (references in the Evidence section) other channels I am not able in testing them because I do not have the possibility

alexhoshina · 2026-03-07T16:05:56Z

Thanks, it's evening here now. I'll try to test the channels I can test tomorrow during the day.
The main branch just had a merge, so you might need to resolve the conflicts. Thank you very much.

afjcjsbx · 2026-03-07T16:27:46Z

no problem, thanks!

…-transcription # Conflicts: # pkg/channels/telegram/telegram.go

afjcjsbx · 2026-03-07T18:16:47Z

also conducted tests with Slack

pkg/channels/slack/slack.go

alexhoshina · 2026-03-08T08:20:00Z

Then I carefully considered the changes to Placeholder in this PR; perhaps we could make the modifications using a lighter approach.
Only delay Placeholder when there is a voice message, while keeping the original behavior unchanged for other messages. For example, like this:

// base.go HandleMessage
hasAudio := hasAudioMedia(media)
  if !hasAudio {
      if pc, ok := c.owner.(PlaceholderCapable); ok {
          if phID, err := pc.SendPlaceholder(ctx, chatID); err == nil && phID != "" {
              c.placeholderRecorder.RecordPlaceholder(c.name, chatID, phID)
          }
      }
  }

Then, after AgentLoop.transcribeAudioInMessage is completed, manually resend the Placeholder.
This not only preserves the semantics of OutboundMessage but also involves relatively minor changes, making the impact more controllable.
your opinion?

afjcjsbx · 2026-03-08T16:34:51Z

yes it is a very good idea to delay the placeholder, I didn't think about it, I will try to implement it and see how it performs, thank you

afjcjsbx · 2026-03-08T17:25:49Z

ok restored old previous logic, now placeholder is delayed in case of voice messages, if you can have a look 🙏

alexhoshina · 2026-03-09T04:15:03Z

pkg/agent/loop.go

+	sendCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
+	defer cancel()
+
+	err := ch.Send(sendCtx, bus.OutboundMessage{


Perhaps it would be better to find a way to call Manager.sendWithRetry()?
Directly using ch.Send() lacks rate limiting, message splitting, and other features.

However, sendWithRetry() is not exposed externally, which might cause some obstacles in calling it.

sendWithRetry() being a private method to Manager, we can't call it directly from loop.go . The natural solution is to expose a public SendMessage() on the Manager that routes through the worker pipeline (rate limiting, message splitting, retry).

However, the worker pipeline is async (messages are enqueued and processed by a background goroutine), which means the transcription feedback and the "Thinking…" placeholder can arrive out of order, the placeholder may be sent before the transcription feedback has actually been delivered.

The original code used a direct ch.Send() precisely to guarantee ordering: the feedback is sent synchronously, and only then does the flow continue to SendPlaceholder().

Two possible approaches:

Make SendMessage() synchronous: call sendWithRetry() inline in the caller's goroutine (using the worker's rate limiter) instead of enqueuing. This preserves ordering while gaining retry/splitting.

Keep SendMessage() async but add a signaling mechanism (e.g. a done channel) so the caller can wait until the message has actually been sent before proceeding to the placeholder.

What do you think?

I think both are fine 😄 I might prefer option 1

implemented, PTAL 🙏

…-transcription # Conflicts: # pkg/channels/telegram/telegram.go # pkg/config/config.go # pkg/config/defaults.go

alexhoshina · 2026-03-11T04:09:30Z

Sorry! I was a bit late with the approval🥲

afjcjsbx · 2026-03-11T07:45:02Z

no problem, thank you!

…anscription feat(channel): echo voice audio transcription feedback

feat(channel): echo voice audio transcription

0c117a0

afjcjsbx requested review from alexhoshina, imguoguo and mengzhuo and removed request for mengzhuo March 7, 2026 14:58

afjcjsbx added 2 commits March 7, 2026 16:18

discord reply message on transcript echo

48d8c87

fix lint

68bdf66

sipeed-bot bot added type: enhancement New feature or request domain: channel domain: agent go Pull requests that update go code labels Mar 7, 2026

afjcjsbx added 2 commits March 7, 2026 16:40

unit test placeholder logic

a0591f0

fix lint

73243c9

afjcjsbx added 3 commits March 7, 2026 18:47

slack reply message with audio transcription

2effc2b

Merge remote-tracking branch 'origin/main' into feat/echo-voice-audio…

424c40e

…-transcription # Conflicts: # pkg/channels/telegram/telegram.go

resolve conflicts

5b1f11a

github-actions bot mentioned this pull request Mar 8, 2026

🦞 OpenClaw 生态日报 2026-03-08 duanyytop/agents-radar#96

Open

alexhoshina requested changes Mar 8, 2026

View reviewed changes

pkg/channels/slack/slack.go Outdated Show resolved Hide resolved

afjcjsbx added 4 commits March 8, 2026 17:41

fixed double message on slack thread

3b5d049

telegram reply only on first message

f219ca1

fix empty strings on failed transcription

f87ab99

Removed the old heavy logic

536e26a

afjcjsbx requested a review from alexhoshina March 8, 2026 17:25

This was referenced Mar 9, 2026

🦞 OpenClaw 生态日报 2026-03-09 duanyytop/agents-radar#107

Open

🦞 OpenClaw Ecosystem Digest 2026-03-09 duanyytop/agents-radar#108

Open

🦞 Bản tin hàng ngày hệ sinh thái OpenClaw 2026-03-09 compasify/agents-radar#15

Open

alexhoshina reviewed Mar 9, 2026

View reviewed changes

afjcjsbx requested a review from alexhoshina March 9, 2026 09:48

afjcjsbx added 3 commits March 9, 2026 11:38

sync sendmessage function

f89c967

Merge remote-tracking branch 'origin/main' into feat/echo-voice-audio…

87d458f

…-transcription # Conflicts: # pkg/channels/telegram/telegram.go # pkg/config/config.go # pkg/config/defaults.go

resolve conflicts

08cc09e

afjcjsbx requested a review from huaaudio March 10, 2026 23:17

alexhoshina approved these changes Mar 11, 2026

View reviewed changes

afjcjsbx merged commit 30584f0 into sipeed:main Mar 11, 2026
4 checks passed

fishtrees pushed a commit to fishtrees/picoclaw that referenced this pull request Mar 12, 2026

Merge pull request sipeed#1214 from afjcjsbx/feat/echo-voice-audio-tr…

36be1a6

…anscription feat(channel): echo voice audio transcription feedback

dj-oyu pushed a commit to dj-oyu/picoclaw that referenced this pull request Mar 14, 2026

Merge pull request sipeed#1214 from afjcjsbx/feat/echo-voice-audio-tr…

a3a0949

…anscription feat(channel): echo voice audio transcription feedback

dj-oyu pushed a commit to dj-oyu/picoclaw that referenced this pull request Mar 16, 2026

Merge pull request sipeed#1214 from afjcjsbx/feat/echo-voice-audio-tr…

655700f

…anscription feat(channel): echo voice audio transcription feedback

Conversation

afjcjsbx commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Description

🗣️ Type of Change

🤖 AI Code Generation

🔗 Related Issue

📚 Technical Context (Skip for Docs)

🧪 Test Environment

📸 Evidence (Optional)

☑️ Checklist

Uh oh!

alexhoshina commented Mar 7, 2026

Uh oh!

afjcjsbx commented Mar 7, 2026

Uh oh!

alexhoshina commented Mar 7, 2026

Uh oh!

afjcjsbx commented Mar 7, 2026

Uh oh!

afjcjsbx commented Mar 7, 2026

Uh oh!

Uh oh!

alexhoshina commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afjcjsbx commented Mar 8, 2026

Uh oh!

afjcjsbx commented Mar 8, 2026

Uh oh!

alexhoshina Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

alexhoshina Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

afjcjsbx Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

alexhoshina Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

afjcjsbx Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

alexhoshina commented Mar 11, 2026

Uh oh!

afjcjsbx commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

afjcjsbx commented Mar 7, 2026 •

edited

Loading

alexhoshina commented Mar 8, 2026 •

edited

Loading