Skip to content

fix(telegram): strip HTML tags from plain fallback#404

Merged
penso merged 5 commits intomainfrom
issue-214
Mar 13, 2026
Merged

fix(telegram): strip HTML tags from plain fallback#404
penso merged 5 commits intomainfrom
issue-214

Conversation

@penso
Copy link
Copy Markdown
Collaborator

@penso penso commented Mar 11, 2026

Summary

  • strip Telegram HTML tags and decode entities before the plain-text resend/edit fallback path
  • add regression coverage that forces an HTML send failure and verifies the fallback request body is clean plain text
  • fix sandbox::tests::test_create_sandbox_off so it explicitly tests mode = off instead of accidentally exercising a real Apple Container backend on macOS
  • harden the onboarding Playwright helper so it keeps polling while the OpenClaw import step finishes async loading before advancing to the LLM step

Fixes #214.

Validation

Completed

  • cargo +nightly-2025-11-30 fmt --all -- --check
  • cargo +nightly-2025-11-30 test -p moltis-telegram outbound::tests -- --nocapture
  • cargo +nightly-2025-11-30 clippy -p moltis-telegram --all-features --tests -- -D warnings
  • cargo +nightly-2025-11-30 test -p moltis-tools sandbox::tests::test_create_sandbox_off -- --nocapture
  • cargo +nightly-2025-11-30 clippy -p moltis-tools --tests -- -D warnings
  • npx @biomejs/biome check --write e2e/specs/onboarding.spec.js
  • npx playwright test e2e/specs/onboarding.spec.js --project=onboarding --grep "llm provider api key form includes key source hint"

Remaining

  • cargo +nightly-2025-11-30 clippy -Z unstable-options --workspace --all-features --all-targets --timings -- -D warnings
    On this macOS host, the run fails in llama-cpp-sys-2 because CUDA / nvcc is unavailable.
  • cargo +nightly-2025-11-30 clippy -Z unstable-options --workspace --all-targets --timings -- -D warnings
    I started the documented macOS fallback, but interrupted it after it exceeded the repo's 5 minute command budget.

Manual QA

  1. Configure a Telegram bot account.
  2. Trigger a send path that retries from HTML to plain text, for example by sending malformed or unsupported HTML through the Telegram outbound path.
  3. Confirm the fallback message shows readable plain text, not raw <b>, <i>, <code> tags.
  4. Visit /onboarding in an environment with OpenClaw detected and confirm the wizard can still advance from Import to the LLM step.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR fixes a bug where Telegram's plain-text fallback path was forwarding raw HTML markup (e.g. <b>, &amp;) to users instead of clean readable text. It introduces telegram_html_to_plain_text, a custom single-pass HTML stripper/entity decoder, and applies it at both send and edit fallback sites. Regression coverage is added via a mock-server integration test.

Changes:

  • telegram_html_to_plain_text strips all HTML tags, emits a \n for block-level/line-break tags (<br>, <p>, <div>, etc.), and decodes the five standard named entities plus numeric (&#NNN; and &#xHHH;) references.
  • send_chunk_with_fallback and edit_chunk_with_fallback now call the helper before constructing plain-text retry requests.
  • New unit tests cover basic tag stripping + entity decoding and numeric entity decoding.
  • A new #[tokio::test] integration test spins up a local axum mock of the Telegram API, forces the HTML path to fail, and asserts that the fallback body contains clean plain text.

Notable observations:

  • The decode_numeric_html_entity helper only matches the &#x (lowercase x) prefix; &#X…; (uppercase X, which is valid per HTML5) would fall through un-decoded. Low practical risk given Telegram always emits lowercase, but worth aligning.
  • The integration test relies on a hard-coded 50 ms sleep to wait for the mock server to start, which is a fragile pattern that can cause intermittent CI failures on slow hosts.
  • The !closed break path in the tag-stripping loop is correct but non-obvious — a short comment would aid future maintainers.

Confidence Score: 4/5

  • This PR is safe to merge; the core fix is correct and well-tested with only minor style concerns.
  • The logic change is small and targeted — it inserts a sanitization step in the two fallback paths and does not touch the happy path. The hand-rolled HTML stripper is simple enough to reason about completely, unit tests and an integration test cover the key scenarios, and no security-sensitive code is modified. The only issues are style-level: a fragile sleep in the integration test and a case-sensitivity gap in hex entity decoding, neither of which affects production correctness for Telegram's actual output.
  • No files require special attention beyond the minor style concerns flagged in crates/telegram/src/outbound.rs.

Important Files Changed

Filename Overview
crates/telegram/src/outbound.rs Adds telegram_html_to_plain_text helper that strips Telegram HTML tags and decodes HTML entities; applies it to both send_chunk_with_fallback and edit_chunk_with_fallback before plain-text retries; adds unit tests and an integration test with a mock axum server. Minor style concerns: fragile 50 ms sleep in the integration test, case-sensitive &#x hex prefix, and a non-obvious break on unclosed tags.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant TelegramOutbound
    participant TelegramAPI

    Caller->>TelegramOutbound: send_html(account_id, to, html_chunk)
    TelegramOutbound->>TelegramAPI: sendMessage parse_mode=HTML
    alt HTML send succeeds
        TelegramAPI-->>TelegramOutbound: 200 OK message_id
        TelegramOutbound-->>Caller: Ok(message_id)
    else HTML send fails
        TelegramAPI-->>TelegramOutbound: 400 Bad Request
        Note over TelegramOutbound: telegram_html_to_plain_text strips tags and decodes entities
        TelegramOutbound->>TelegramAPI: sendMessage plain text no parse_mode
        alt Plain send succeeds
            TelegramAPI-->>TelegramOutbound: 200 OK message_id
            TelegramOutbound-->>Caller: Ok(message_id)
        else Plain send also fails
            TelegramAPI-->>TelegramOutbound: error
            TelegramOutbound-->>Caller: Err(ChannelError)
        end
    end
Loading

Last reviewed commit: f722cbd

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f722cbd87d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 11, 2026

Merging this PR will not alter performance

✅ 39 untouched benchmarks
⏩ 5 skipped benchmarks1


Comparing issue-214 (d105304) with main (fa66941)

Open in CodSpeed

Footnotes

  1. 5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 11, 2026

Codecov Report

❌ Patch coverage is 90.72581% with 23 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/telegram/src/outbound.rs 90.72% 23 Missing ⚠️

📢 Thoughts on this report? Let us know!

penso added 3 commits March 12, 2026 12:20
- resolve the onboarding E2E conflict with main\n- address PR feedback on Telegram HTML fallback parsing
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fd90e26d15

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@penso penso merged commit 49cde53 into main Mar 13, 2026
34 of 35 checks passed
@penso penso deleted the issue-214 branch March 13, 2026 00:14
penso added a commit that referenced this pull request Mar 23, 2026
* fix(telegram): strip html on plain fallback

* test(tools): fix sandbox off coverage

* test(web): wait for onboarding import step

* fix(telegram): preserve fallback whitespace
jmikedupont2 pushed a commit to meta-introspector/moltis that referenced this pull request Mar 23, 2026
* fix(telegram): strip html on plain fallback

* test(tools): fix sandbox off coverage

* test(web): wait for onboarding import step

* fix(telegram): preserve fallback whitespace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Telegram delivery falls back to plain text, showing raw HTML tags

1 participant