Conversation
There was a problem hiding this comment.
Pull request overview
Adds a shared, code-fence-aware message chunking utility and applies it to Telegram (and Discord) so long LLM responses don’t fail platform message-length limits.
Changes:
- Introduces
utils.SplitMessage/utils.SplitMessageIterfor reusable message chunking with basic code-block preservation. - Updates Telegram send path to stream chunks (iterator callback) and adds additional Telegram length safeguards.
- Refactors Discord to use the shared chunking utility and removes the duplicated chunking implementation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| pkg/utils/string.go | Adds shared chunking helpers (SplitMessage, SplitMessageIter) and supporting code-fence/whitespace split utilities. |
| pkg/channels/telegram.go | Sends Telegram messages in chunks (with placeholder edit for the first chunk) to avoid 4096-char delivery failures. |
| pkg/channels/discord.go | Replaces local chunking logic with the shared utils.SplitMessage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| logger.WarnCF("telegram", "Failed to edit placeholder message, sending new message instead", map[string]interface{}{ | ||
| "error": err.Error(), | ||
| }) | ||
|
|
||
| // If edit fails, fall back to sending a new message for this and subsequent chunks | ||
| hasPlaceholder = false |
There was a problem hiding this comment.
err is referenced outside the scope where it’s declared: if _, err := c.bot.EditMessageText(...); ... defines err only inside the if, but logger.WarnCF(..., {"error": err.Error()}) uses it after the block. This will not compile. Capture the error into a variable (e.g., editErr := ...) before logging, or restructure the if to keep the error in scope.
| } | |
| logger.WarnCF("telegram", "Failed to edit placeholder message, sending new message instead", map[string]interface{}{ | |
| "error": err.Error(), | |
| }) | |
| // If edit fails, fall back to sending a new message for this and subsequent chunks | |
| hasPlaceholder = false | |
| } else { | |
| logger.WarnCF("telegram", "Failed to edit placeholder message, sending new message instead", map[string]interface{}{ | |
| "error": err.Error(), | |
| }) | |
| // If edit fails, fall back to sending a new message for this and subsequent chunks | |
| hasPlaceholder = false | |
| } |
pkg/utils/string.go
Outdated
| count := 0 | ||
| lastOpenIdx := -1 | ||
|
|
||
| for i := 0; i < len(text); i++ { | ||
| if i+2 < len(text) && text[i] == '`' && text[i+1] == '`' && text[i+2] == '`' { | ||
| if count == 0 { | ||
| lastOpenIdx = i | ||
| } | ||
| count++ | ||
| i += 2 | ||
| } | ||
| } | ||
|
|
||
| // If odd number of ``` markers, last one is unclosed | ||
| if count%2 == 1 { |
There was a problem hiding this comment.
findLastUnclosedCodeBlock doesn’t reliably return the last unclosed opening fence. With multiple code blocks, lastOpenIdx is only set on the very first fence (if count == 0), so if a later code block is the one left unclosed, this function returns the wrong index. Consider tracking an inCodeBlock boolean (toggle on each ```), and updating lastOpenIdx on each opening fence so the final unclosed opening is returned correctly.
| count := 0 | |
| lastOpenIdx := -1 | |
| for i := 0; i < len(text); i++ { | |
| if i+2 < len(text) && text[i] == '`' && text[i+1] == '`' && text[i+2] == '`' { | |
| if count == 0 { | |
| lastOpenIdx = i | |
| } | |
| count++ | |
| i += 2 | |
| } | |
| } | |
| // If odd number of ``` markers, last one is unclosed | |
| if count%2 == 1 { | |
| lastOpenIdx := -1 | |
| inCodeBlock := false | |
| for i := 0; i < len(text); i++ { | |
| if i+2 < len(text) && text[i] == '`' && text[i+1] == '`' && text[i+2] == '`' { | |
| if !inCodeBlock { | |
| // Opening fence | |
| lastOpenIdx = i | |
| inCodeBlock = true | |
| } else { | |
| // Closing fence | |
| inCodeBlock = false | |
| } | |
| i += 2 | |
| } | |
| } | |
| if inCodeBlock { |
pkg/utils/string.go
Outdated
| for len(content) > 0 { | ||
| if len(content) <= limit { | ||
| if content != "" { | ||
| if err := cb(content); err != nil { | ||
| return err | ||
| } | ||
| } | ||
| break | ||
| } | ||
|
|
||
| msgEnd := limit | ||
|
|
||
| // Find natural split point within the limit | ||
| msgEnd = findLastNewline(content[:limit], 200) | ||
| if msgEnd <= 0 { | ||
| msgEnd = findLastSpace(content[:limit], 100) | ||
| } | ||
| if msgEnd <= 0 { | ||
| msgEnd = limit | ||
| } |
There was a problem hiding this comment.
SplitMessageIter slices strings using byte indices (content[:limit], content[:msgEnd]). If msgEnd falls in the middle of a multi-byte UTF-8 rune (e.g., when no newline/space is found and msgEnd = limit), the resulting chunk will contain invalid UTF-8. This can corrupt output and may break downstream processing (e.g., markdown conversion). Adjust msgEnd to a rune boundary (e.g., backtrack until utf8.ValidString(content[:msgEnd]) or use utf8.RuneStart) before slicing.
pkg/utils/string.go
Outdated
| content = strings.TrimSpace(content) | ||
| for len(content) > 0 { | ||
| if len(content) <= limit { | ||
| if content != "" { | ||
| if err := cb(content); err != nil { | ||
| return err | ||
| } | ||
| } | ||
| break | ||
| } | ||
|
|
||
| msgEnd := limit | ||
|
|
||
| // Find natural split point within the limit | ||
| msgEnd = findLastNewline(content[:limit], 200) | ||
| if msgEnd <= 0 { | ||
| msgEnd = findLastSpace(content[:limit], 100) | ||
| } | ||
| if msgEnd <= 0 { | ||
| msgEnd = limit | ||
| } | ||
|
|
||
| // Check if this would end with an incomplete code block | ||
| candidate := content[:msgEnd] | ||
| unclosedIdx := findLastUnclosedCodeBlock(candidate) | ||
|
|
||
| if unclosedIdx >= 0 { | ||
| // Message would end with incomplete code block | ||
| // Try to extend to include the closing ``` (with some buffer) | ||
| extendedLimit := limit + 500 // Allow buffer for code blocks | ||
| if len(content) > extendedLimit { | ||
| closingIdx := findNextClosingCodeBlock(content, msgEnd) | ||
| if closingIdx > 0 && closingIdx <= extendedLimit { | ||
| // Extend to include the closing ``` | ||
| msgEnd = closingIdx | ||
| } else { | ||
| // Can't find closing, split before the code block | ||
| msgEnd = findLastNewline(content[:unclosedIdx], 200) | ||
| if msgEnd <= 0 { | ||
| msgEnd = findLastSpace(content[:unclosedIdx], 100) | ||
| } | ||
| if msgEnd <= 0 { | ||
| msgEnd = unclosedIdx | ||
| } | ||
| } | ||
| } else { | ||
| // Remaining content fits within extended limit | ||
| msgEnd = len(content) | ||
| } | ||
| } | ||
|
|
||
| if msgEnd <= 0 { | ||
| msgEnd = limit | ||
| } | ||
|
|
||
| chunk := strings.TrimSpace(content[:msgEnd]) | ||
| if chunk != "" { | ||
| if err := cb(chunk); err != nil { | ||
| return err | ||
| } | ||
| } | ||
| content = strings.TrimSpace(content[msgEnd:]) | ||
| } |
There was a problem hiding this comment.
SplitMessageIter uses strings.TrimSpace on the full content and on every produced chunk. This changes the message content (removes leading/trailing whitespace/newlines), which can alter formatting (notably indentation in code blocks / markdown lists) and makes the splitter non-lossless. To preserve original content, avoid trimming the emitted chunks; if you need to prevent empty/whitespace-only chunks, consider trimming only for the emptiness check or trimming only the boundary newline without removing indentation.
pkg/utils/string.go
Outdated
| // SplitMessageIter splits content into chunks and calls cb for each chunk. | ||
| // This avoids allocating a slice to hold all chunks and is more memory-efficient for very large messages. | ||
| func SplitMessageIter(content string, limit int, cb func(chunk string) error) error { | ||
| content = strings.TrimSpace(content) | ||
| for len(content) > 0 { | ||
| if len(content) <= limit { | ||
| if content != "" { | ||
| if err := cb(content); err != nil { |
There was a problem hiding this comment.
SplitMessageIter will panic when limit <= 0 (due to content[:limit]) and will also panic if cb is nil. Since this is a shared utility, it should defensively validate inputs (e.g., return an error when limit <= 0 or cb == nil).
nikolasdehor
left a comment
There was a problem hiding this comment.
Good approach — the SplitMessageIter refactor is clean and the conservative 3000-char limit with 4096 hard truncation is sensible.
Bug (blocking): In telegram.go, the placeholder edit fallback has a nil pointer panic. The err from c.bot.EditMessageText(ctx, editMsg) is scoped to the if block (:=). The err referenced in the WarnCF call is the outer err from strconv.ParseInt, which is nil at that point:
```go
// err here is scoped to the if-block
if _, err := c.bot.EditMessageText(ctx, editMsg); err == nil {
chunkIndex++
return nil
}
// This references the OUTER err (from ParseInt) = nil → PANIC
logger.WarnCF("telegram", "Failed to edit...", map[string]interface{}{
"error": err.Error(), // nil pointer dereference
})
```
Fix:
```go
if _, editErr := c.bot.EditMessageText(ctx, editMsg); editErr == nil {
chunkIndex++
return nil
} else {
logger.WarnCF("telegram", "Failed to edit...", map[string]interface{}{
"error": editErr.Error(),
})
}
```
Also: the PR acknowledges this is untested — please test before merge, at minimum with a message exceeding 4096 chars.
|
Hi @harshbansal7 ! Thanks for the PR, can you fix the err reference bug as mentioned above?
I'd suggest we just put it in |
|
Hi @harshbansal7 ! Thanks for opening this PR! To move forward, we will need you to rebase latest main, address the issues raised by others, and add some tests. Thank you! |
|
Included in the refactor branch |
Cool! |
📝 Description
utils/string.go(SplitMessage+SplitMessageIter), and updated bothDiscordChannelandTelegramChannelto use this shared, more memory-efficient implementation.🗣️ Type of Change
🤖 AI Code Generation
🔗 Related Issue
#449
#244
#212
📚 Technical Context (Skip for Docs)
https://core.telegram.org/bots/api#sendmessage(Telegram message length limitations)utilsand an iterator-based variant (SplitMessageIter) so channels can process chunks one-by-one without allocating an entire slice of chunks, which is more suitable for low-memory deployments.🧪 Test Environment
Yet to be tested
☑️ Checklist