-
Notifications
You must be signed in to change notification settings - Fork 824
MCP 连接失败时不应自动退出 - 建议与 Codex/Claude Code 保持一致 || MCP should not exit automatically when connection fails - it is recommended to be consistent with Codex/Claude Code #769
Description
What feature would you like to see?
1. 问题描述
目前,当 Kimi Code CLI 启动时,如果任何已配置的 MCP(Model Context Protocol)服务器连接失败,CLI 会立即退出并抛出致命错误。
这种“一刀切”的处理方式导致即便用户只需要使用内置工具或其他正常的 MCP 工具,也因单一节点的故障而无法进入交互界面。
报错信息
MCPRuntimeError: Failed to connect MCP servers: {'server-name': <error>}
2. 行为对比
当前行为 (Current Behavior)
- 启动检查:
kimi启动时尝试连接所有config.toml中配置的 MCP 服务器。 - 立即崩溃:如果有任一服务器连接失败,进程立即终止。
- 用户阻塞:用户必须手动修复网络问题、启动相关服务或修改配置文件移除错误的 MCP,才能使用 CLI。
期望行为 (Expected Behavior)
- 容错启动:即使部分 MCP 服务器连接失败,CLI 也应继续完成启动流程。
- 状态告知:在启动日志中显示警告信息(Warning),明确告知用户哪些服务器连接失败。
- 功能降级:允许用户继续使用:
- 内置工具(如
file,shell,grep等)。 - 已成功连接的 MCP 工具。
- 内置工具(如
- (可选)动态重试:支持在运行时通过指令(如
/mcp reload)重试失败的连接。
3. 使用场景 (Use Cases)
- 临时网络波动:远程 HTTP MCP 服务器因网络抖动暂时无法访问,但不影响本地开发。
- 依赖服务未就绪:依赖本地进程的
stdioMCP 服务器(如数据库连接器)尚未启动。 - 多环境配置:用户配置了全量的 MCP 服务器,但当前任务仅需使用其中一部分,不应强制所有服务在线。
- 优雅降级:符合现代工具的设计理念,类似于 IDE 插件加载失败不应导致 IDE 无法打开。
4. 竞品对比
| 工具 | MCP 失败时的行为 | 用户体验影响 |
|---|---|---|
| Codex CLI | ✅ 低(仅相关功能不可用) | |
| Claude Code | ✅ 低(仅相关功能不可用) | |
| Kimi Code CLI | ❌ 立即退出进程 | ⛔ 高(工具完全不可用) |
5. 建议方案
方案 1:非阻塞式 MCP 加载(推荐)
- 将默认行为更改为 "Warn and Continue"。
- 在启动横幅后打印警告:
⚠️ 部分 MCP 服务器连接失败:[server1, server2]。 - 保持 CLI 交互界面活跃,仅禁用失败工具的调用。
方案 2:新增配置选项
在 ~/.kimi/config.toml 中添加容错开关:
Ini, TOML
[mcp]
# 默认为 false 以获得更好的用户体验,允许部分加载失败
fail_fast = false
方案 3:交互式提示
当检测到连接失败时,暂停启动并询问用户:
MCP 服务器 "xyz" 连接失败。是否跳过该服务器并继续?[Y/n]
6. 技术背景与定位
经过代码分析,问题核心逻辑位于 kimi_cli/soul/toolset.py 第 302 行:
Python
# kimi_cli/soul/toolset.py
if failed_servers:
_toast_mcp("mcp connection failed")
# 此处直接抛出异常,导致阻断
raise MCPRuntimeError(f"Failed to connect MCP servers: {failed_servers}")
同时,在入口文件 kimi_cli/cli/__init__.py 的 _reload_loop() 中,目前仅捕获了 Reload 相关的异常。MCPRuntimeError 会直接向上冒泡,导致整个 Python 进程终止。
7. 环境信息
- Kimi Code CLI 版本: kimi, version 1.3
- 操作系统:macOS
- 安装方式:uv
8. 补充说明
"快速失败"(Fail-fast)策略对于配置文件的语法错误是有意义的。但对于运行时的网络连接或第三方服务状态,这种设计造成了不必要的使用障碍。用户应当拥有“在部分工具缺失的情况下继续工作”的选择权,而不是被迫陷入“要么全好,要么全挂”的二元困境。
Additional information
No response
What feature would you like to see?
1. Problem description
Currently, when the Kimi Code CLI is started, if any of the configured MCP (Model Context Protocol) server connection fails, the CLI will immediately exit with a fatal error.
This "one size fits all" approach results in users being unable to enter the interactive interface due to a single node failure, even if they only need to use built-in tools or other normal MCP tools.
Error message
MCPRuntimeError: Failed to connect MCP servers: {'server-name': <error>}
2. Behavior comparison
Current Behavior
- Startup Check:
kimiattempts to connect to all MCP servers configured inconfig.tomlwhen starting. - Immediate Crash: If any server connection fails, the process will be terminated immediately.
- User blocking: Users must manually fix network problems, start related services, or modify configuration files to remove the wrong MCP before using the CLI.
Expected Behavior
- Fault Tolerant Boot: The CLI should continue to complete the boot process even if some MCP server connections fail.
- Status Notification: Display a warning message (Warning) in the startup log to clearly inform the user which server connection failed.
- Function downgrade: Allow users to continue using:
- Built-in tools (such as
file,shell,grep, etc.). - Successfully connected MCP tool.
- Built-in tools (such as
- (Optional) Dynamic Retry: Supports retrying failed connections at runtime via commands (such as
/mcp reload).
3. Use Cases
- Temporary network fluctuation: The remote HTTP MCP server is temporarily inaccessible due to network jitter, but it does not affect local development.
- Dependent Service Not Ready: The
stdioMCP server (such as the database connector) that relies on the local process has not been started. - Multi-environment configuration: The user has configured a full amount of MCP servers, but only a part of them is used for the current task, and all services should not be forced online.
- Graceful downgrade: In line with the design concept of modern tools, similar to the failure of IDE plug-in loading should not cause the IDE to be unable to open.
4. Comparison of competing products
| Tools | Behavior when MCP fails | User Experience Impact |
|---|---|---|
| Codex CLI | ✅ Low (only related functions are not available) | |
| Claude Code | ✅ Low (only relevant features are not available) | |
| Kimi Code CLI | ❌ Quit the process immediately | ⛔ High (Tool is completely unavailable) |
5. Suggestions
Option 1: Non-blocking MCP loading (recommended)
- Changed default behavior to "Warn and Continue".
- Print warning after startup banner:
⚠️ Connection failed for some MCP servers: [server1, server2]. - Keep the CLI interface active and only disable calls to failed tools.
Option 2: New configuration options
Add fault tolerance switch in ~/.kimi/config.toml:
Ini, TOML
[mcp]
# Defaults to false for better user experience and allows partial loading to fail
fail_fast = false
Option 3: Interactive prompts
When a connection failure is detected, pause startup and ask the user:
MCP server "xyz" connection failed. Skip this server and continue? [Y/n]
6. Technical background and positioning
After code analysis, the core logic of the problem is located at line 302 of kimi_cli/soul/toolset.py:
Python
# kimi_cli/soul/toolset.py
if failed_servers:
_toast_mcp("mcp connection failed")
# An exception is thrown directly here, causing blocking
raise MCPRuntimeError(f"Failed to connect MCP servers: {failed_servers}")
At the same time, in _reload_loop() of the entry file kimi_cli/cli/__init__.py, only Reload related exceptions are currently captured. MCPRuntimeError will bubble up directly, causing the entire Python process to terminate.
7. Environmental information
- Kimi Code CLI version: kimi, version 1.3
- Operating System: macOS
- Installation method: uv
8. Supplementary instructions
The "Fail-fast" strategy makes sense for syntax errors in configuration files. But for network connection or third-party service status at runtime, this design creates unnecessary barriers to use. Users should have the option to continue working without some tools, rather than being forced into a binary dilemma of "all good or all broken."
Additional information
No response