Skip to content

feat: add message chunking in Telegram Send method#935

Merged
alexhoshina merged 31 commits intosipeed:mainfrom
putueddy:feat/telegram-chunking
Mar 7, 2026
Merged

feat: add message chunking in Telegram Send method#935
alexhoshina merged 31 commits intosipeed:mainfrom
putueddy:feat/telegram-chunking

Conversation

@putueddy
Copy link
Contributor

@putueddy putueddy commented Mar 1, 2026

Summary

  • Split HTML content into 4000-character chunks in the Telegram Send() method before sending to the Telegram API
  • Fixes cases where markdown-to-HTML conversion (markdownToTelegramHTML) produces output exceeding Telegram's 4096-character message limit, causing send failures
  • Uses the existing channels.SplitMessage() utility which preserves code block integrity across chunk boundaries
  • The 4000 limit (vs 4096) leaves headroom for any additional HTML tag overhead

Context

The channel manager already splits raw markdown at the MaxMessageLength boundary (4096), but the subsequent HTML conversion in Send() can expand the content beyond that limit due to HTML tags like <b>, <pre><code>, <a href="...">, etc. This change adds a safety net at the HTML level.

Test plan

  • Send a long message (>4000 chars) through Telegram and verify it arrives as multiple messages
  • Verify code blocks spanning chunk boundaries are properly handled
  • Confirm short messages (<4000 chars) still send as a single message
  • Test HTML fallback to plain text still works per chunk

🤖 Generated with Claude Code

putueddy and others added 3 commits March 1, 2026 08:44
1. migration.go: Set ModelName to userModel when provider matches so
   GetModelConfig(userModel) can find the entry. Previously the migration
   created entries with the provider name as ModelName (e.g. "moonshot")
   but lookup used the model name (e.g. "k2p5"), causing "model not found".

2. openai_compat/provider.go: Preserve reasoning_content in conversation
   history. Thinking models (e.g. Kimi K2, DeepSeek-R1) return
   reasoning_content which must be echoed back. Without it, APIs return
   400: "thinking is enabled but reasoning_content is missing".

3. shell.go: Fix deny pattern regex for format/mkfs/diskpart to use
   (?:^|\s) instead of \b to avoid matching --format flags.
   Fix path extraction regex to use submatch to avoid matching flags
   like -rf as paths.

4. loop.go: Adjust forceCompression mid-point to avoid splitting
   tool-call/result message pairs, which causes API errors.
- Add "kimi", "kimi-code", "moonshot" provider cases in factory.go
  with default API base https://api.kimi.com/coding/v1
- Add Kimi Code API User-Agent header (KimiCLI/0.77) for api.kimi.com
- Add "opencode" provider with default API base https://opencode.ai/zen/v1
- Add "opencode" to recognized HTTP-compatible protocols in factory_provider
- Add Opencode field to ProvidersConfig, IsEmpty, HasProvidersConfig
- Add opencode migration entry in ConvertProvidersToModelList
- Update moonshot fallback API base from api.moonshot.cn to api.kimi.com
Split HTML content into 4000-char chunks before sending to handle cases
where markdown-to-HTML conversion causes messages to exceed Telegram's
4096-character limit. Uses the existing SplitMessage utility which
preserves code block integrity across chunk boundaries.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copilot AI review requested due to automatic review settings March 1, 2026 01:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a second-stage safety split for Telegram outbound messages by chunking the post-conversion HTML before calling the Telegram API, to avoid failures when Markdown→HTML expansion pushes content beyond Telegram’s 4096-character limit.

Changes:

  • Split markdownToTelegramHTML output into 4000-character chunks before sending.
  • Apply the existing HTML→plain-text fallback behavior per chunk.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

putueddy and others added 3 commits March 1, 2026 09:08
- Deny regex: expand left boundary to match shell separators (;, &&, ||)
  to prevent bypass via chained commands like ";format c:"
- Path regex: add "." to initial char class to catch hidden dirs (/.ssh),
  add "=" to left boundary to catch flag-attached paths (--file=/etc/passwd)
- Add test: ModelName must match user model for GetModelConfig lookup
- Add test: stripSystemParts preserves reasoning_content in wire format
- Add test: forceCompression avoids orphaning tool result messages
- Add test: deny pattern blocks disk-wiping commands with shell separators
  while allowing legitimate --format flags

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Allow APIBase-only config for opencode provider selection (like VLLM)
- Keep moonshot provider on moonshot.cn/v1 default, only use kimi.com/coding/v1 for kimi/kimi-code
- Use url.Parse hostname match for Kimi User-Agent check instead of strings.Contains
- Add opencode to DefaultAPIBase test cases in factory_provider_test.go
- Add opencode migration tests (full config + APIBase-only) in migration_test.go
- Update AllProviders test count to include opencode (18 -> 19)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add early return for empty content to avoid silent no-op
- Split raw markdown before HTML conversion so SplitMessage's
  code-fence-aware logic works correctly and HTML tags/entities
  are never broken by mid-tag splitting

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@afjcjsbx
Copy link
Collaborator

afjcjsbx commented Mar 2, 2026

sounds like a good optimization, could you add some unit tests to validate its correctness?

Cover empty content early return, single-message send,
multi-chunk splitting for long messages, HTML-to-plain-text
fallback per chunk, and error propagation.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copilot AI review requested due to automatic review settings March 2, 2026 14:55
@putueddy
Copy link
Contributor Author

putueddy commented Mar 2, 2026

Added unit tests for the Send() method in pkg/channels/telegram/telegram_test.go. The tests cover:

  1. Empty content — returns early with no API calls
  2. Short message — sends as a single SendMessage call
  3. Long message (>4000 chars) — splits into multiple SendMessage calls
  4. HTML fallback — falls back to plain text when HTML parse fails
  5. Both sends fail — returns ErrTemporary
  6. Multi-chunk error — stops early if a chunk fails both attempts
  7. Not running / invalid chat ID — proper error returns

Thanks for the suggestion!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

putueddy and others added 6 commits March 2, 2026 22:08
Load .env files from the config directory before reading config.json,
enabling secrets and API keys to be stored outside version control.
Supports fresh installs (no config.json) by applying env vars and
provider overrides to the default config.

Adds loadProviderEnvOverrides() for PICOCLAW_PROVIDERS_<NAME>_API_KEY
and _API_BASE environment variables across all standard providers.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add Exa (https://exa.ai) as a new web search provider option, slotting
into the priority chain between Perplexity and Brave. Configurable via
config.json or PICOCLAW_TOOLS_WEB_EXA_* environment variables.

Results are capped to the requested count for consistency with other
search providers.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
When markdownToTelegramHTML expands a chunk beyond 4096 chars (e.g.
**a** → <b>a</b>), re-split the markdown with a proportionally smaller
maxLen so each resulting HTML chunk fits within Telegram's limit.

Extract sendHTMLChunk helper to avoid duplicating the HTML-send +
plain-text-fallback logic.

Add test case for markdown-short-but-HTML-long scenario to verify
the re-splitting behavior.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add explicit empty-results handling ("No results for: <query>")
- Add "Results for: <query> (via Exa)" header and align per-result
  format with Brave/Tavily/DuckDuckGo/Perplexity
- Add tests: provider priority (Perplexity > Exa > Brave), proxy
  propagation, successful search with header/attribution, empty
  results, and max-results capping

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add migrateChannelConfigs() and ValidateModelList() to the fresh-
  install path (no config.json) so legacy env vars are migrated and
  model list is validated consistently with the normal loading path
- Use os.LookupEnv instead of os.Getenv in loadProviderEnvOverrides
  so explicitly empty env vars (e.g. PICOCLAW_PROVIDERS_X_API_BASE=)
  can clear values from config.json
- Guard .env loading with sync.Once to avoid repeated disk I/O and
  noisy log messages when LoadConfig is called from polling handlers
- Add tests: .env file loading, missing config.json with env vars,
  malformed .env non-fatal behavior, and LookupEnv empty-override

Note: go.mod tcell/v2 and tview are correctly listed as direct deps
(they are imported by the launcher TUI); upstream go.mod was stale.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@afjcjsbx
Copy link
Collaborator

afjcjsbx commented Mar 2, 2026

fix lint please 🙏

Break function signatures and assert calls that exceed the 120-char
golines limit onto multiple lines.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copilot AI review requested due to automatic review settings March 3, 2026 02:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

putueddy and others added 2 commits March 3, 2026 09:18
- Add "kimi-code" to the moonshot provider's providerNames in
  ConvertProvidersToModelList so configs using
  agents.defaults.provider: "kimi-code" migrate correctly.
- Add TestProviderChat_KimiCodeUserAgent verifying that
  User-Agent: KimiCLI/0.77 is set when apiBase hostname is
  api.kimi.com and not set for other hosts.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Reduce markdown input from 700 to 600 repeats (3600 runes) so it stays
under the 4000-rune chunk threshold. This ensures the test actually
exercises the HTML-expansion re-splitting logic rather than being split
at the markdown level first.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@sipeed-bot sipeed-bot bot added type: bug Something isn't working domain: channel labels Mar 3, 2026
@afjcjsbx afjcjsbx requested a review from alexhoshina March 3, 2026 21:31
Copy link
Collaborator

@alexhoshina alexhoshina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but there's a small detail that might need to be changed, or is there a specific reason?

The Manager splits at MaxMessageLength before calling Send(), and
Telegram's Send() was re-splitting at 4000 internally. Aligning the
channel-level limit to 4000 avoids that redundant second split while
preserving the safety margin for markdown-to-HTML expansion.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copilot AI review requested due to automatic review settings March 4, 2026 13:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

putueddy added a commit to putueddy/picoclaw that referenced this pull request Mar 4, 2026
- Add test for markdown-short-but-HTML-long message splitting in
  Telegram (input stays under 4096 runes to exercise HTML re-split)
- Harden disk-wiping deny regex to catch shell separators (&&, ||, &)
  without requiring spaces
- Fix Exa snippet truncation to use rune slicing instead of byte
  slicing, preventing broken UTF-8

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copilot AI review requested due to automatic review settings March 4, 2026 14:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 4, 2026 14:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

When HTML parsing fails, the fallback was re-sending the same HTML
string with ParseMode cleared, showing raw HTML tags to users.
Now pass the original markdown chunk so the fallback displays
readable plain text instead.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@putueddy putueddy force-pushed the feat/telegram-chunking branch from 2010308 to 3de4cb8 Compare March 4, 2026 15:15
After re-splitting an oversized chunk, sub-chunks were sent without
verifying their HTML also fits under 4096 chars. Non-uniform HTML
expansion (e.g. a sub-chunk dense with bold/links) could still exceed
the limit. Use a queue that pushes sub-chunks back for re-validation
instead of sending them blindly.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copilot AI review requested due to automatic review settings March 4, 2026 15:19
Accept upstream versions for all non-Telegram files to keep PR
scope focused on Telegram message chunking only.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

WithMaxMessageLength(4000) already ensures msg.Content ≤ 4000 chars
before reaching Send(), making the SplitMessage call redundant.
The HTML expansion safety net (re-split when >4096 after conversion)
is still preserved.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@alexhoshina alexhoshina merged commit 826f92c into sipeed:main Mar 7, 2026
4 checks passed
dj-oyu pushed a commit to dj-oyu/picoclaw that referenced this pull request Mar 8, 2026
feat: add message chunking in Telegram Send method
@Orgmar
Copy link
Contributor

Orgmar commented Mar 9, 2026

@putueddy Smart move using SplitMessage to preserve code block integrity across chunks. Leaving that headroom between 4000 and 4096 for HTML tag overhead shows good attention to edge cases.

By the way, we have a PicoClaw Dev Group on Discord for contributors to chat and collaborate. If you'd like to join, drop an email to [email protected] with the subject [Join PicoClaw Dev Group] putueddy and we'll send you the invite link!

fishtrees pushed a commit to fishtrees/picoclaw that referenced this pull request Mar 12, 2026
feat: add message chunking in Telegram Send method
dj-oyu pushed a commit to dj-oyu/picoclaw that referenced this pull request Mar 14, 2026
feat: add message chunking in Telegram Send method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: channel type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants