fix(extensions): fix lifecycle bugs + comprehensive E2E tests#1070
fix(extensions): fix lifecycle bugs + comprehensive E2E tests#1070henrypark133 merged 12 commits intostagingfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the stability, security, and testability of the extension management system. It addresses several critical bugs in the extension lifecycle, particularly around authentication and cleanup, which previously led to stale states and blocked operations. A major refactoring of the authentication and configuration APIs provides a more unified and robust mechanism for managing extension secrets and activation. Complementing these code changes, a comprehensive suite of new end-to-end tests has been added, ensuring the reliability of the extension system and providing a solid foundation for future development. Additionally, HTTP request security has been bolstered with advanced SSRF protections, and the web UI's HTML sanitization has been upgraded to a more secure library. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request provides several important bug fixes for the extension lifecycle, addressing issues like missing auth guards, stale caches, and incomplete cleanup on removal. The changes are well-implemented and logical. The addition of a comprehensive suite of 53 new E2E tests is a fantastic improvement, ensuring these lifecycle flows are robust and preventing future regressions. The test infrastructure enhancements are also well done.
There was a problem hiding this comment.
Pull request overview
This PR hardens extension lifecycle behavior (WASM + OAuth + MCP) based on issues found during end-to-end testing, and adds new E2E scenarios + supporting test infrastructure to prevent regressions.
Changes:
- Unifies extension secret configuration via a new
ExtensionManager::configure()/configure_token()flow and tightens activation/auth lifecycle behavior (cleanup on remove, auth gating, clearer error signaling). - Improves WASM and MCP runtime behaviors (WASM capabilities description/schema overrides; MCP JSON-RPC notification handling and initialization idempotency guard).
- Adds comprehensive E2E tests and test harness support (OAuth exchange mock endpoint, WASM lifecycle scenarios, tool execution, pairing).
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/e2e/scenarios/test_wasm_lifecycle.py | New WASM tool lifecycle E2E coverage (registry→install→setup→activate→remove→reinstall + UI check). |
| tests/e2e/scenarios/test_tool_execution.py | New E2E tests for tool-calling through the chat UI using the mock LLM. |
| tests/e2e/scenarios/test_pairing.py | New E2E tests for pairing API auth/error handling. |
| tests/e2e/scenarios/test_extension_oauth.py | New E2E tests for OAuth round-trip using gateway callback mode + mock exchange endpoint. |
| tests/e2e/mock_llm.py | Enhances mock LLM with tool-calling + OAuth exchange endpoint + additional routing compatibility. |
| tests/e2e/helpers.py | Adds authenticated API helper functions for E2E scenarios. |
| tests/e2e/conftest.py | E2E infra: WASM dirs, worktree symlink support for build artifacts, OAuth env wiring. |
| src/tools/wasm/runtime.rs | Clarifies tool description/schema extraction as fallback-only (prefer capabilities sidecar). |
| src/tools/wasm/loader.rs | Loads tool description + parameters schema from capabilities.json with validation + warnings. |
| src/tools/wasm/capabilities_schema.rs | Extends capabilities schema with description + parameters and adds unit tests. |
| src/tools/mcp/unix_transport.rs | Avoids waiting for responses to JSON-RPC notifications (id-less requests). |
| src/tools/mcp/stdio_transport.rs | Same notification handling fix for stdio transport. |
| src/tools/mcp/client.rs | Adds local initialization guard, always initializes before list/call, and updates tests accordingly. |
| src/tools/builtin/http.rs | Refactors SSRF/DNS handling to single-resolution + DNS pinning; adds redirect handling constraints and tests. |
| src/tools/builtin/extension_tools.rs | Updates extension tool auth calls to the new manager API signature. |
| src/setup/prompts.rs | Improves terminal prompt robustness on Windows (drain queued events; Press-only key handling). |
| src/extensions/mod.rs | Introduces ConfigureResult and a distinct ValidationFailed error variant. |
| src/extensions/manager.rs | Major lifecycle updates: auth becomes read-only, new configure/configure_token entrypoints, remove cleanup, activation guard, token validation via validation_endpoint. |
| src/cli/mcp.rs | Uses MCP client factory for transport dispatch in mcp test. |
| src/channels/web/ws.rs | Switches WS auth flow to configure_token() and handles validation errors distinctly. |
| src/channels/web/static/index.html | Adds DOMPurify dependency for safer markdown rendering. |
| src/channels/web/static/app.js | Replaces regex sanitizer with DOMPurify-based sanitizer. |
| src/channels/web/server.rs | Sends SSE notification on expired OAuth flows; switches setup submit to configure() and auth checks to new API. |
| src/channels/web/handlers/chat.rs | Updates chat auth token handler to use configure_token() and surface validation failures. |
| src/agent/thread_ops.rs | Updates auth-mode token handling to configure_token() and aligns status signaling. |
| channels-src/telegram/telegram.capabilities.json | Adds validation_endpoint for Telegram token validation. |
Comments suppressed due to low confidence (1)
src/extensions/manager.rs:3593
- Token validation uses
validate_fetch_url()(which only validates the URL string) but then performs the request with a plainreqwest::Client. This reintroduces a DNS-rebinding window where an attacker-controlled domain could resolve to a public IP during validation and then to a private IP at request time. Consider reusing the same DNS-pinning approach as the HTTP tool (single resolution + SSRF check +resolve_to_addrs) or making the existing safe-fetch helper reusable here.
let url = endpoint_template.replace(&format!("{{{}}}", secret_def.name), &encoded);
// SSRF defense: block private IPs, localhost, cloud metadata endpoints
crate::tools::builtin::skill_tools::validate_fetch_url(&url)
.map_err(|e| ExtensionError::Other(format!("SSRF blocked: {}", e)))?;
let resp = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(10))
.build()
.map_err(|e| ExtensionError::Other(e.to_string()))?
.get(&url)
.send()
.await
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Refactors the extension lifecycle to eliminate the divergence between chat and gateway paths that caused Telegram setup via chat to fail (missing webhook secret auto-generation, no token validation). Key changes: - Rename save_setup_secrets() → configure(): single entrypoint for providing secrets to any extension (WasmChannel, WasmTool, MCP). Validates, stores, auto-generates, and activates. - Add configure_token(): convenience wrapper for single-token callers (chat auth card, WebSocket, agent auth mode). - Refactor auth() to pure status check: remove token parameter, delete token-storing branches from auth_mcp/auth_wasm_tool, rename auth_wasm_channel → auth_wasm_channel_status. - Add ConfigureResult/MissingSecret types for structured responses. - Replace hardcoded Telegram token validation with generic validation_endpoint from capabilities.json. - Update all callers (9 files) to use the new interface. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace brittle msg.contains("Invalid token") checks with a proper
ExtensionError::ValidationFailed variant. configure() now returns
this variant for token validation failures, and callers match on it
directly instead of parsing error message strings.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…election, WS auth 1. SSRF: call validate_fetch_url() before validation_endpoint HTTP request 2. Transport errors map to ExtensionError::Other (not ValidationFailed) 3. configure_token() picks first *missing* secret, not first non-optional 4. WebSocket error path re-emits AuthRequired on ValidationFailed Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- test_configure_token_picks_first_missing_secret: verifies multi-secret channels can be configured one secret at a time (commit ce106f4) - test_auth_is_read_only_for_wasm_channel: verifies auth() has no side effects and doesn't store secrets (commit 47f8eb6) - test_validation_failed_is_distinct_error_variant: verifies the typed error variant can be pattern-matched (commit a318161) Co-Authored-By: Claude Opus 4.6 <[email protected]>
…onsolidation - Fix configure() fallthrough bug: dispatch activation by ExtensionKind instead of unconditionally calling activate_wasm_channel() for all non-WasmTool types (MCP servers and channel relays now use their correct activation methods) - Remove dead MissingSecret struct and missing_secrets field (never populated, flagged by reviewer) - Consolidate capabilities file parsing in configure(): parse once and reuse for allowed names, validation_endpoint, and auto-generation - Fix auth() doc comment: note MCP OAuth side effects - Fix stale save_setup_secrets reference in server.rs comment - Add regression test for activation dispatch bug Co-Authored-By: Claude Opus 4.6 <[email protected]>
Bug fixes in src/extensions/manager.rs: - Add auth guard to activate_wasm_tool() blocking activation when secrets are missing (NeedsSetup), matching activate_wasm_channel() behavior - Evict WasmToolRuntime module cache on remove() so reinstall uses fresh binary - Clear activation_errors on remove() for both WasmTool and WasmChannel - Clean up in-progress OAuth flows on remove() (abort TCP listener, purge pending flow entries) Bug fix in src/channels/web/server.rs: - Broadcast AuthCompleted SSE event on expired OAuth callback so web UI doesn't stay stuck showing "auth required" E2E test coverage: - test_wasm_lifecycle.py: 35 tests covering install/configure/activate/ remove/reinstall lifecycle with regression tests for bugs 1 and 3 - test_extension_oauth.py: 9 tests covering OAuth round-trip flow - test_tool_execution.py: 5 tests for tool invocation via chat - test_pairing.py: 4 tests for pairing request lifecycle - Enhanced conftest.py, helpers.py, mock_llm.py for OAuth mock support [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <[email protected]>
109b3b1 to
efcdd54
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/e2e/conftest.py
Outdated
| # Temp directory for WASM tools (pre-populated from dev builds) | ||
| _WASM_TOOLS_TMPDIR = tempfile.TemporaryDirectory(prefix="ironclaw-e2e-wasm-tools-") | ||
| _WASM_CHANNELS_TMPDIR = tempfile.TemporaryDirectory(prefix="ironclaw-e2e-wasm-channels-") | ||
|
|
||
|
|
There was a problem hiding this comment.
The comment says the WASM tools temp dir is "pre-populated from dev builds", but the fixtures intentionally keep WASM_TOOLS_DIR empty at boot and rely on the install pipeline to write files there. Please update this comment to reflect the actual behavior so future changes don’t misinterpret the intent.
| # Temp directory for WASM tools (pre-populated from dev builds) | |
| _WASM_TOOLS_TMPDIR = tempfile.TemporaryDirectory(prefix="ironclaw-e2e-wasm-tools-") | |
| _WASM_CHANNELS_TMPDIR = tempfile.TemporaryDirectory(prefix="ironclaw-e2e-wasm-channels-") | |
| # Temp directory for WASM tools; starts empty and is populated by the install | |
| # pipeline during tests (fixtures do not pre-populate from dev builds) | |
| _WASM_TOOLS_TMPDIR = tempfile.TemporaryDirectory(prefix="ironclaw-e2e-wasm-tools-") | |
| _WASM_CHANNELS_TMPDIR = tempfile.TemporaryDirectory(prefix="ironclaw-e2e-wasm-channels-") |
src/channels/web/server.rs
Outdated
| let ext_mgr = Arc::new(ExtensionManager::new( | ||
| mcp_sm, | ||
| Arc::new(crate::tools::mcp::process::McpProcessManager::new()), | ||
| secrets.clone(), | ||
| tool_registry, | ||
| None, | ||
| None, | ||
| std::path::PathBuf::from("/tmp/wasm_tools"), | ||
| std::path::PathBuf::from("/tmp/wasm_channels"), | ||
| None, |
There was a problem hiding this comment.
This test hard-codes /tmp/wasm_tools and /tmp/wasm_channels for the ExtensionManager directories. That can be non-portable (e.g., Windows) and can also create cross-test interference if multiple test processes run concurrently. Prefer using a tempfile::tempdir() and passing those paths into ExtensionManager::new.
src/channels/web/server.rs
Outdated
| created_at: std::time::Instant::now() | ||
| .checked_sub(std::time::Duration::from_secs(600)) | ||
| .expect("System uptime is too low to run expired flow test"), | ||
| }; |
There was a problem hiding this comment.
created_at is set using Instant::now().checked_sub(Duration::from_secs(600)).expect(...), which will panic on systems with very low monotonic uptime (e.g., freshly-booted environments). To avoid a potential CI-only flake, set created_at relative to oauth_defaults::OAUTH_FLOW_EXPIRY (plus a small delta) and handle the None case by skipping the test or choosing a smaller subtraction that cannot underflow in practice.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…#1070) * feat(extensions): unify auth and configure into single entrypoint Refactors the extension lifecycle to eliminate the divergence between chat and gateway paths that caused Telegram setup via chat to fail (missing webhook secret auto-generation, no token validation). Key changes: - Rename save_setup_secrets() → configure(): single entrypoint for providing secrets to any extension (WasmChannel, WasmTool, MCP). Validates, stores, auto-generates, and activates. - Add configure_token(): convenience wrapper for single-token callers (chat auth card, WebSocket, agent auth mode). - Refactor auth() to pure status check: remove token parameter, delete token-storing branches from auth_mcp/auth_wasm_tool, rename auth_wasm_channel → auth_wasm_channel_status. - Add ConfigureResult/MissingSecret types for structured responses. - Replace hardcoded Telegram token validation with generic validation_endpoint from capabilities.json. - Update all callers (9 files) to use the new interface. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: use ValidationFailed error variant instead of string matching Replace brittle msg.contains("Invalid token") checks with a proper ExtensionError::ValidationFailed variant. configure() now returns this variant for token validation failures, and callers match on it directly instead of parsing error message strings. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: address review — SSRF protection, error typing, missing-secret selection, WS auth 1. SSRF: call validate_fetch_url() before validation_endpoint HTTP request 2. Transport errors map to ExtensionError::Other (not ValidationFailed) 3. configure_token() picks first *missing* secret, not first non-optional 4. WebSocket error path re-emits AuthRequired on ValidationFailed Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * test: add regression tests for extension lifecycle refactoring - test_configure_token_picks_first_missing_secret: verifies multi-secret channels can be configured one secret at a time (commit ce106f4) - test_auth_is_read_only_for_wasm_channel: verifies auth() has no side effects and doesn't store secrets (commit 47f8eb6) - test_validation_failed_is_distinct_error_variant: verifies the typed error variant can be pattern-matched (commit a318161) Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix: address review comments — activation dispatch, dead code, caps consolidation - Fix configure() fallthrough bug: dispatch activation by ExtensionKind instead of unconditionally calling activate_wasm_channel() for all non-WasmTool types (MCP servers and channel relays now use their correct activation methods) - Remove dead MissingSecret struct and missing_secrets field (never populated, flagged by reviewer) - Consolidate capabilities file parsing in configure(): parse once and reuse for allowed names, validation_endpoint, and auto-generation - Fix auth() doc comment: note MCP OAuth side effects - Fix stale save_setup_secrets reference in server.rs comment - Add regression test for activation dispatch bug Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(extensions): fix 5 extension lifecycle bugs found during E2E testing Bug fixes in src/extensions/manager.rs: - Add auth guard to activate_wasm_tool() blocking activation when secrets are missing (NeedsSetup), matching activate_wasm_channel() behavior - Evict WasmToolRuntime module cache on remove() so reinstall uses fresh binary - Clear activation_errors on remove() for both WasmTool and WasmChannel - Clean up in-progress OAuth flows on remove() (abort TCP listener, purge pending flow entries) Bug fix in src/channels/web/server.rs: - Broadcast AuthCompleted SSE event on expired OAuth callback so web UI doesn't stay stuck showing "auth required" E2E test coverage: - test_wasm_lifecycle.py: 35 tests covering install/configure/activate/ remove/reinstall lifecycle with regression tests for bugs 1 and 3 - test_extension_oauth.py: 9 tests covering OAuth round-trip flow - test_tool_execution.py: 5 tests for tool invocation via chat - test_pairing.py: 4 tests for pairing request lifecycle - Enhanced conftest.py, helpers.py, mock_llm.py for OAuth mock support [skip-regression-check] Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(web): unify extension auth UX and add lifecycle regressions * test: fix pending oauth flow fixtures after rebase * test(e2e): fix playwright route ordering for extensions reloads * test: address e2e review follow-ups * test: address remaining PR review comments --------- Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
Summary
What Changed
Backend lifecycle fixes
activation_errors, pending OAuth state, and pending auth listeners when removing WASM tools.activation_errorsand delete channel artifacts when removing WASM channels.auth_completedfailure event when an OAuth callback arrives after the flow has expired.Web UX improvements
auth_urlthrough the same auth experience.auth_completedarrives.Regression coverage
CX Decisions Captured In This Branch
removepreserves saved secrets; reinstall can reuse credentials instead of forcing users to recreate tokens or re-enter shared keys.Test Plan
cargo test --lib remove_wasm_cargo test --lib oauth_callback_expired_flowuv run --project tests/e2e python -m pytest tests/e2e/scenarios/test_extensions.py -quv run --project tests/e2e python -m pytest tests/e2e -q(120 passed, 1 skipped)