feat: add message chunking in Telegram Send method#935
feat: add message chunking in Telegram Send method#935alexhoshina merged 31 commits intosipeed:mainfrom
Conversation
1. migration.go: Set ModelName to userModel when provider matches so GetModelConfig(userModel) can find the entry. Previously the migration created entries with the provider name as ModelName (e.g. "moonshot") but lookup used the model name (e.g. "k2p5"), causing "model not found". 2. openai_compat/provider.go: Preserve reasoning_content in conversation history. Thinking models (e.g. Kimi K2, DeepSeek-R1) return reasoning_content which must be echoed back. Without it, APIs return 400: "thinking is enabled but reasoning_content is missing". 3. shell.go: Fix deny pattern regex for format/mkfs/diskpart to use (?:^|\s) instead of \b to avoid matching --format flags. Fix path extraction regex to use submatch to avoid matching flags like -rf as paths. 4. loop.go: Adjust forceCompression mid-point to avoid splitting tool-call/result message pairs, which causes API errors.
- Add "kimi", "kimi-code", "moonshot" provider cases in factory.go with default API base https://api.kimi.com/coding/v1 - Add Kimi Code API User-Agent header (KimiCLI/0.77) for api.kimi.com - Add "opencode" provider with default API base https://opencode.ai/zen/v1 - Add "opencode" to recognized HTTP-compatible protocols in factory_provider - Add Opencode field to ProvidersConfig, IsEmpty, HasProvidersConfig - Add opencode migration entry in ConvertProvidersToModelList - Update moonshot fallback API base from api.moonshot.cn to api.kimi.com
Split HTML content into 4000-char chunks before sending to handle cases where markdown-to-HTML conversion causes messages to exceed Telegram's 4096-character limit. Uses the existing SplitMessage utility which preserves code block integrity across chunk boundaries. Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Adds a second-stage safety split for Telegram outbound messages by chunking the post-conversion HTML before calling the Telegram API, to avoid failures when Markdown→HTML expansion pushes content beyond Telegram’s 4096-character limit.
Changes:
- Split
markdownToTelegramHTMLoutput into 4000-character chunks before sending. - Apply the existing HTML→plain-text fallback behavior per chunk.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Deny regex: expand left boundary to match shell separators (;, &&, ||) to prevent bypass via chained commands like ";format c:" - Path regex: add "." to initial char class to catch hidden dirs (/.ssh), add "=" to left boundary to catch flag-attached paths (--file=/etc/passwd) - Add test: ModelName must match user model for GetModelConfig lookup - Add test: stripSystemParts preserves reasoning_content in wire format - Add test: forceCompression avoids orphaning tool result messages - Add test: deny pattern blocks disk-wiping commands with shell separators while allowing legitimate --format flags Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Allow APIBase-only config for opencode provider selection (like VLLM) - Keep moonshot provider on moonshot.cn/v1 default, only use kimi.com/coding/v1 for kimi/kimi-code - Use url.Parse hostname match for Kimi User-Agent check instead of strings.Contains - Add opencode to DefaultAPIBase test cases in factory_provider_test.go - Add opencode migration tests (full config + APIBase-only) in migration_test.go - Update AllProviders test count to include opencode (18 -> 19) Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add early return for empty content to avoid silent no-op - Split raw markdown before HTML conversion so SplitMessage's code-fence-aware logic works correctly and HTML tags/entities are never broken by mid-tag splitting Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
sounds like a good optimization, could you add some unit tests to validate its correctness? |
Cover empty content early return, single-message send, multi-chunk splitting for long messages, HTML-to-plain-text fallback per chunk, and error propagation. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Added unit tests for the
Thanks for the suggestion! |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Load .env files from the config directory before reading config.json, enabling secrets and API keys to be stored outside version control. Supports fresh installs (no config.json) by applying env vars and provider overrides to the default config. Adds loadProviderEnvOverrides() for PICOCLAW_PROVIDERS_<NAME>_API_KEY and _API_BASE environment variables across all standard providers. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add Exa (https://exa.ai) as a new web search provider option, slotting into the priority chain between Perplexity and Brave. Configurable via config.json or PICOCLAW_TOOLS_WEB_EXA_* environment variables. Results are capped to the requested count for consistency with other search providers. Co-Authored-By: Claude Opus 4.6 <[email protected]>
When markdownToTelegramHTML expands a chunk beyond 4096 chars (e.g. **a** → <b>a</b>), re-split the markdown with a proportionally smaller maxLen so each resulting HTML chunk fits within Telegram's limit. Extract sendHTMLChunk helper to avoid duplicating the HTML-send + plain-text-fallback logic. Add test case for markdown-short-but-HTML-long scenario to verify the re-splitting behavior. Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add explicit empty-results handling ("No results for: <query>")
- Add "Results for: <query> (via Exa)" header and align per-result
format with Brave/Tavily/DuckDuckGo/Perplexity
- Add tests: provider priority (Perplexity > Exa > Brave), proxy
propagation, successful search with header/attribution, empty
results, and max-results capping
Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Add migrateChannelConfigs() and ValidateModelList() to the fresh- install path (no config.json) so legacy env vars are migrated and model list is validated consistently with the normal loading path - Use os.LookupEnv instead of os.Getenv in loadProviderEnvOverrides so explicitly empty env vars (e.g. PICOCLAW_PROVIDERS_X_API_BASE=) can clear values from config.json - Guard .env loading with sync.Once to avoid repeated disk I/O and noisy log messages when LoadConfig is called from polling handlers - Add tests: .env file loading, missing config.json with env vars, malformed .env non-fatal behavior, and LookupEnv empty-override Note: go.mod tcell/v2 and tview are correctly listed as direct deps (they are imported by the launcher TUI); upstream go.mod was stale. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
fix lint please 🙏 |
Break function signatures and assert calls that exceed the 120-char golines limit onto multiple lines. Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add "kimi-code" to the moonshot provider's providerNames in ConvertProvidersToModelList so configs using agents.defaults.provider: "kimi-code" migrate correctly. - Add TestProviderChat_KimiCodeUserAgent verifying that User-Agent: KimiCLI/0.77 is set when apiBase hostname is api.kimi.com and not set for other hosts. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Reduce markdown input from 700 to 600 repeats (3600 runes) so it stays under the 4000-rune chunk threshold. This ensures the test actually exercises the HTML-expansion re-splitting logic rather than being split at the markdown level first. Co-Authored-By: Claude Opus 4.6 <[email protected]>
alexhoshina
left a comment
There was a problem hiding this comment.
LGTM, but there's a small detail that might need to be changed, or is there a specific reason?
The Manager splits at MaxMessageLength before calling Send(), and Telegram's Send() was re-splitting at 4000 internally. Aligning the channel-level limit to 4000 avoids that redundant second split while preserving the safety margin for markdown-to-HTML expansion. Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 20 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add test for markdown-short-but-HTML-long message splitting in Telegram (input stays under 4096 runes to exercise HTML re-split) - Harden disk-wiping deny regex to catch shell separators (&&, ||, &) without requiring spaces - Fix Exa snippet truncation to use rune slicing instead of byte slicing, preventing broken UTF-8 Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 20 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 20 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When HTML parsing fails, the fallback was re-sending the same HTML string with ParseMode cleared, showing raw HTML tags to users. Now pass the original markdown chunk so the fallback displays readable plain text instead. Co-Authored-By: Claude Opus 4.6 <[email protected]>
2010308 to
3de4cb8
Compare
After re-splitting an oversized chunk, sub-chunks were sent without verifying their HTML also fits under 4096 chars. Non-uniform HTML expansion (e.g. a sub-chunk dense with bold/links) could still exceed the limit. Use a queue that pushes sub-chunks back for re-validation instead of sending them blindly. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Accept upstream versions for all non-Telegram files to keep PR scope focused on Telegram message chunking only. Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
WithMaxMessageLength(4000) already ensures msg.Content ≤ 4000 chars before reaching Send(), making the SplitMessage call redundant. The HTML expansion safety net (re-split when >4096 after conversion) is still preserved. Co-Authored-By: Claude Opus 4.6 <[email protected]>
feat: add message chunking in Telegram Send method
|
@putueddy Smart move using SplitMessage to preserve code block integrity across chunks. Leaving that headroom between 4000 and 4096 for HTML tag overhead shows good attention to edge cases. By the way, we have a PicoClaw Dev Group on Discord for contributors to chat and collaborate. If you'd like to join, drop an email to |
feat: add message chunking in Telegram Send method
feat: add message chunking in Telegram Send method
Summary
Send()method before sending to the Telegram APImarkdownToTelegramHTML) produces output exceeding Telegram's 4096-character message limit, causing send failureschannels.SplitMessage()utility which preserves code block integrity across chunk boundariesContext
The channel manager already splits raw markdown at the
MaxMessageLengthboundary (4096), but the subsequent HTML conversion inSend()can expand the content beyond that limit due to HTML tags like<b>,<pre><code>,<a href="...">, etc. This change adds a safety net at the HTML level.Test plan
🤖 Generated with Claude Code