Skip to content

Add chunking on Telegram Msgs#476

Closed
harshbansal7 wants to merge 3 commits intosipeed:mainfrom
harshbansal7:telegram_chunking
Closed

Add chunking on Telegram Msgs#476
harshbansal7 wants to merge 3 commits intosipeed:mainfrom
harshbansal7:telegram_chunking

Conversation

@harshbansal7
Copy link
Collaborator

@harshbansal7 harshbansal7 commented Feb 19, 2026

📝 Description

  • Implemented safe Telegram message chunking to handle LLM responses beyond Telegram’s 4096-character limit by splitting content into smaller chunks and preserving Markdown/code-block structure.
  • Centralized and optimized chunking logic in utils/string.go (SplitMessage + SplitMessageIter), and updated both DiscordChannel and TelegramChannel to use this shared, more memory-efficient implementation.
  • Improved memory behavior for Telegram sending path, using an iterator-based splitter and avoiding unnecessary allocations and redundant rune counting, which is important for running under tight (<10MB) RAM constraints.

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

#449
#244
#212

📚 Technical Context (Skip for Docs)

  • Reference URL: https://core.telegram.org/bots/api#sendmessage (Telegram message length limitations)
  • Reasoning:
    • Telegram hard-limits messages to 4096 characters, and HTML formatting can inflate effective length.
    • Previous logic tried HTML then plain text but did not split or truncate safely, causing delivery failures for long LLM outputs.
    • Introduced a shared, code-block-aware splitter in utils and an iterator-based variant (SplitMessageIter) so channels can process chunks one-by-one without allocating an entire slice of chunks, which is more suitable for low-memory deployments.

🧪 Test Environment

Yet to be tested

☑️ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes (including lints on touched files).
  • I have updated the documentation accordingly.

Copilot AI review requested due to automatic review settings February 19, 2026 12:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a shared, code-fence-aware message chunking utility and applies it to Telegram (and Discord) so long LLM responses don’t fail platform message-length limits.

Changes:

  • Introduces utils.SplitMessage / utils.SplitMessageIter for reusable message chunking with basic code-block preservation.
  • Updates Telegram send path to stream chunks (iterator callback) and adds additional Telegram length safeguards.
  • Refactors Discord to use the shared chunking utility and removes the duplicated chunking implementation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
pkg/utils/string.go Adds shared chunking helpers (SplitMessage, SplitMessageIter) and supporting code-fence/whitespace split utilities.
pkg/channels/telegram.go Sends Telegram messages in chunks (with placeholder edit for the first chunk) to avoid 4096-char delivery failures.
pkg/channels/discord.go Replaces local chunking logic with the shared utils.SplitMessage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +196 to +203
}

logger.WarnCF("telegram", "Failed to edit placeholder message, sending new message instead", map[string]interface{}{
"error": err.Error(),
})

// If edit fails, fall back to sending a new message for this and subsequent chunks
hasPlaceholder = false
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err is referenced outside the scope where it’s declared: if _, err := c.bot.EditMessageText(...); ... defines err only inside the if, but logger.WarnCF(..., {"error": err.Error()}) uses it after the block. This will not compile. Capture the error into a variable (e.g., editErr := ...) before logging, or restructure the if to keep the error in scope.

Suggested change
}
logger.WarnCF("telegram", "Failed to edit placeholder message, sending new message instead", map[string]interface{}{
"error": err.Error(),
})
// If edit fails, fall back to sending a new message for this and subsequent chunks
hasPlaceholder = false
} else {
logger.WarnCF("telegram", "Failed to edit placeholder message, sending new message instead", map[string]interface{}{
"error": err.Error(),
})
// If edit fails, fall back to sending a new message for this and subsequent chunks
hasPlaceholder = false
}

Copilot uses AI. Check for mistakes.
Comment on lines +107 to +121
count := 0
lastOpenIdx := -1

for i := 0; i < len(text); i++ {
if i+2 < len(text) && text[i] == '`' && text[i+1] == '`' && text[i+2] == '`' {
if count == 0 {
lastOpenIdx = i
}
count++
i += 2
}
}

// If odd number of ``` markers, last one is unclosed
if count%2 == 1 {
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findLastUnclosedCodeBlock doesn’t reliably return the last unclosed opening fence. With multiple code blocks, lastOpenIdx is only set on the very first fence (if count == 0), so if a later code block is the one left unclosed, this function returns the wrong index. Consider tracking an inCodeBlock boolean (toggle on each ```), and updating lastOpenIdx on each opening fence so the final unclosed opening is returned correctly.

Suggested change
count := 0
lastOpenIdx := -1
for i := 0; i < len(text); i++ {
if i+2 < len(text) && text[i] == '`' && text[i+1] == '`' && text[i+2] == '`' {
if count == 0 {
lastOpenIdx = i
}
count++
i += 2
}
}
// If odd number of ``` markers, last one is unclosed
if count%2 == 1 {
lastOpenIdx := -1
inCodeBlock := false
for i := 0; i < len(text); i++ {
if i+2 < len(text) && text[i] == '`' && text[i+1] == '`' && text[i+2] == '`' {
if !inCodeBlock {
// Opening fence
lastOpenIdx = i
inCodeBlock = true
} else {
// Closing fence
inCodeBlock = false
}
i += 2
}
}
if inCodeBlock {

Copilot uses AI. Check for mistakes.
Comment on lines +38 to +57
for len(content) > 0 {
if len(content) <= limit {
if content != "" {
if err := cb(content); err != nil {
return err
}
}
break
}

msgEnd := limit

// Find natural split point within the limit
msgEnd = findLastNewline(content[:limit], 200)
if msgEnd <= 0 {
msgEnd = findLastSpace(content[:limit], 100)
}
if msgEnd <= 0 {
msgEnd = limit
}
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SplitMessageIter slices strings using byte indices (content[:limit], content[:msgEnd]). If msgEnd falls in the middle of a multi-byte UTF-8 rune (e.g., when no newline/space is found and msgEnd = limit), the resulting chunk will contain invalid UTF-8. This can corrupt output and may break downstream processing (e.g., markdown conversion). Adjust msgEnd to a rune boundary (e.g., backtrack until utf8.ValidString(content[:msgEnd]) or use utf8.RuneStart) before slicing.

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +99
content = strings.TrimSpace(content)
for len(content) > 0 {
if len(content) <= limit {
if content != "" {
if err := cb(content); err != nil {
return err
}
}
break
}

msgEnd := limit

// Find natural split point within the limit
msgEnd = findLastNewline(content[:limit], 200)
if msgEnd <= 0 {
msgEnd = findLastSpace(content[:limit], 100)
}
if msgEnd <= 0 {
msgEnd = limit
}

// Check if this would end with an incomplete code block
candidate := content[:msgEnd]
unclosedIdx := findLastUnclosedCodeBlock(candidate)

if unclosedIdx >= 0 {
// Message would end with incomplete code block
// Try to extend to include the closing ``` (with some buffer)
extendedLimit := limit + 500 // Allow buffer for code blocks
if len(content) > extendedLimit {
closingIdx := findNextClosingCodeBlock(content, msgEnd)
if closingIdx > 0 && closingIdx <= extendedLimit {
// Extend to include the closing ```
msgEnd = closingIdx
} else {
// Can't find closing, split before the code block
msgEnd = findLastNewline(content[:unclosedIdx], 200)
if msgEnd <= 0 {
msgEnd = findLastSpace(content[:unclosedIdx], 100)
}
if msgEnd <= 0 {
msgEnd = unclosedIdx
}
}
} else {
// Remaining content fits within extended limit
msgEnd = len(content)
}
}

if msgEnd <= 0 {
msgEnd = limit
}

chunk := strings.TrimSpace(content[:msgEnd])
if chunk != "" {
if err := cb(chunk); err != nil {
return err
}
}
content = strings.TrimSpace(content[msgEnd:])
}
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SplitMessageIter uses strings.TrimSpace on the full content and on every produced chunk. This changes the message content (removes leading/trailing whitespace/newlines), which can alter formatting (notably indentation in code blocks / markdown lists) and makes the splitter non-lossless. To preserve original content, avoid trimming the emitted chunks; if you need to prevent empty/whitespace-only chunks, consider trimming only for the emptiness check or trimming only the boundary newline without removing indentation.

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +41
// SplitMessageIter splits content into chunks and calls cb for each chunk.
// This avoids allocating a slice to hold all chunks and is more memory-efficient for very large messages.
func SplitMessageIter(content string, limit int, cb func(chunk string) error) error {
content = strings.TrimSpace(content)
for len(content) > 0 {
if len(content) <= limit {
if content != "" {
if err := cb(content); err != nil {
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SplitMessageIter will panic when limit <= 0 (due to content[:limit]) and will also panic if cb is nil. Since this is a shared utility, it should defensively validate inputs (e.g., return an error when limit <= 0 or cb == nil).

Copilot uses AI. Check for mistakes.
Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good approach — the SplitMessageIter refactor is clean and the conservative 3000-char limit with 4096 hard truncation is sensible.

Bug (blocking): In telegram.go, the placeholder edit fallback has a nil pointer panic. The err from c.bot.EditMessageText(ctx, editMsg) is scoped to the if block (:=). The err referenced in the WarnCF call is the outer err from strconv.ParseInt, which is nil at that point:

```go
// err here is scoped to the if-block
if _, err := c.bot.EditMessageText(ctx, editMsg); err == nil {
chunkIndex++
return nil
}
// This references the OUTER err (from ParseInt) = nil → PANIC
logger.WarnCF("telegram", "Failed to edit...", map[string]interface{}{
"error": err.Error(), // nil pointer dereference
})
```

Fix:
```go
if _, editErr := c.bot.EditMessageText(ctx, editMsg); editErr == nil {
chunkIndex++
return nil
} else {
logger.WarnCF("telegram", "Failed to edit...", map[string]interface{}{
"error": editErr.Error(),
})
}
```

Also: the PR acknowledges this is untested — please test before merge, at minimum with a message exceeding 4096 chars.

@huaaudio
Copy link
Collaborator

huaaudio commented Feb 19, 2026

Hi @harshbansal7 ! Thanks for the PR, can you fix the err reference bug as mentioned above?

err is referenced outside the scope where it’s declared: if _, err := c.bot.EditMessageText(...); ... defines err only inside the if, but logger.WarnCF(..., {"error": err.Error()}) uses it after the block. This will not compile. Capture the error into a variable (e.g., editErr := ...) before logging, or restructure the if to keep the error in scope.

I'd suggest we just put it in else. Also, it will be good to test e.g. let PicoClaw send you a long output with multiple code blocks via Telegram.

@nikolasdehor
Copy link

This addresses the same Telegram chunking issue as #251 and #527. #527 implements universal chunking across all channels (not just Telegram), which supersedes both this and #251. Suggesting this be closed in favor of #527.

@xiaket
Copy link
Collaborator

xiaket commented Feb 23, 2026

Hi @harshbansal7 ! Thanks for opening this PR! To move forward, we will need you to rebase latest main, address the issues raised by others, and add some tests. Thank you!

@alexhoshina alexhoshina added this to the Refactor Channel milestone Feb 26, 2026
@alexhoshina
Copy link
Collaborator

Included in the refactor branch

@harshbansal7
Copy link
Collaborator Author

Included in the refactor branch

Cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants