-
Notifications
You must be signed in to change notification settings - Fork 469
Add MCP server endpoints #1453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MCP server endpoints #1453
Changes from all commits
3a8af21
52c196b
9eaf84d
43c8fe4
1c7f33e
de4b293
1e1f8c5
90ebe5e
2440fe7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,202 @@ | ||
| # MCP protocol support | ||
|
|
||
| `mistralrs-server` can serve **MCP (Model Control Protocol)** traffic next to the regular OpenAI-compatible HTTP interface! | ||
|
|
||
| MCP is an open, tool-based protocol that lets clients interact with models through structured *tool calls* instead of free-form HTTP routes. | ||
|
|
||
| Under the hood the server uses [`rust-mcp-sdk`](https://crates.io/crates/rust-mcp-sdk) and exposes tools based on the supported modalities of the loaded model. | ||
|
|
||
| Exposed tools: | ||
|
|
||
| | Tool | Minimum `input` -> `output` modalities | Description | | ||
| | -- | -- | -- | | ||
| | `chat` | | `Text` -> `Text` | Wraps the OpenAI `/v1/chat/completions` endpoint. | | ||
EricLBuehler marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| --- | ||
|
|
||
| ## ToC | ||
| - [MCP protocol support](#mcp-protocol-support) | ||
| - [ToC](#toc) | ||
| - [Running](#running) | ||
| - [Check if it's working](#check-if-its-working) | ||
| - [Example clients](#example-clients) | ||
| - [Rust](#rust) | ||
| - [Python](#python) | ||
| - [HTTP](#http) | ||
| - [Limitations](#limitations) | ||
|
|
||
| --- | ||
|
|
||
| ## Running | ||
|
|
||
| Start the normal HTTP server and add the `--mcp-port` flag to spin up an MCP server on a separate port: | ||
|
|
||
| ```bash | ||
| ./target/release/mistralrs-server \ | ||
| --port 1234 # OpenAI compatible HTTP API | ||
| --mcp-port 4321 # MCP protocol endpoint (Streamable HTTP) | ||
| plain -m mistralai/Mistral-7B-Instruct-v0.3 | ||
| ``` | ||
|
|
||
| ## Check if it's working | ||
|
|
||
| Run this `curl` command to check the available tools: | ||
|
|
||
| ``` | ||
| curl -X POST http://localhost:4321/mcp \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "jsonrpc": "2.0", | ||
| "id": 2, | ||
| "method": "tools/list", | ||
| "params": {} | ||
| }' | ||
| ``` | ||
|
|
||
| ## Example clients | ||
|
|
||
| ### Rust | ||
|
|
||
| ```rust | ||
| use anyhow::Result; | ||
| use rust_mcp_sdk::{ | ||
| mcp_client::client_runtime, | ||
| schema::{ | ||
| CallToolRequestParams, ClientCapabilities, CreateMessageRequest, | ||
| Implementation, InitializeRequestParams, Message, LATEST_PROTOCOL_VERSION, | ||
| }, | ||
| ClientSseTransport, ClientSseTransportOptions, | ||
| }; | ||
|
|
||
| struct Handler; | ||
| #[async_trait::async_trait] | ||
| impl rust_mcp_sdk::mcp_client::ClientHandler for Handler {} | ||
|
|
||
| #[tokio::main] | ||
| async fn main() -> Result<()> { | ||
| let transport = ClientSseTransport::new( | ||
| "http://localhost:4321/mcp", | ||
| ClientSseTransportOptions::default(), | ||
| )?; | ||
|
|
||
| let details = InitializeRequestParams { | ||
| capabilities: ClientCapabilities::default(), | ||
| client_info: Implementation { name: "mcp-client".into(), version: "0.1".into() }, | ||
| protocol_version: LATEST_PROTOCOL_VERSION.into(), | ||
| }; | ||
|
|
||
| let client = client_runtime::create_client(details, transport, Handler); | ||
| client.clone().start().await?; | ||
|
|
||
| let req = CreateMessageRequest { | ||
| model: "mistralai/Mistral-7B-Instruct-v0.3".into(), | ||
| messages: vec![Message::user("Explain Rust ownership.")], | ||
| ..Default::default() | ||
| }; | ||
|
|
||
| let result = client | ||
| .call_tool(CallToolRequestParams::new("chat", req.into())) | ||
| .await?; | ||
|
|
||
| println!("{}", result.content[0].as_text_content()?.text); | ||
| client.shut_down().await?; | ||
| Ok(()) | ||
| } | ||
| ``` | ||
|
|
||
| ### Python | ||
|
|
||
| ```py | ||
| import asyncio | ||
| from mcp import ClientSession | ||
| from mcp.client.streamable_http import streamablehttp_client | ||
|
|
||
| SERVER_URL = "http://localhost:4321/mcp" | ||
|
|
||
| async def main() -> None: | ||
| async with streamablehttp_client(SERVER_URL) as (read, write, _): | ||
| async with ClientSession(read, write) as session: | ||
|
|
||
| # --- INITIALIZE --- | ||
| init_result = await session.initialize() | ||
| print("Server info:", init_result.serverInfo) | ||
|
|
||
| # --- LIST TOOLS --- | ||
| tools = await session.list_tools() | ||
| print("Available tools:", [t.name for t in tools.tools]) | ||
|
|
||
| # --- CALL TOOL --- | ||
| resp = await session.call_tool( | ||
| "chat", | ||
| arguments={ | ||
| "messages": [ | ||
| {"role": "user", "content": "Hello MCP 👋"}, | ||
| {"role": "assistant", "content": "Hi there!"} | ||
| ], | ||
| "maxTokens": 50, | ||
| "temperature": 0.7, | ||
| }, | ||
| ) | ||
| # resp.content is a list[CallToolResultContentItem]; extract text parts | ||
| text = "\n".join(c.text for c in resp.content if c.type == "text") | ||
| print("Model replied:", text) | ||
|
|
||
| if __name__ == "__main__": | ||
| asyncio.run(main()) | ||
| ``` | ||
|
|
||
| ### HTTP | ||
|
|
||
| **Call a tool:** | ||
| ```bash | ||
| curl -X POST http://localhost:4321/mcp \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "jsonrpc": "2.0", | ||
| "id": 3, | ||
| "method": "tools/call", | ||
| "params": { | ||
| "name": "chat", | ||
| "arguments": { | ||
| "messages": [ | ||
| { "role": "system", "content": "You are a helpful assistant." }, | ||
| { "role": "user", "content": "Hello, what’s the time?" } | ||
| ], | ||
| "maxTokens": 50, | ||
| "temperature": 0.7 | ||
| } | ||
| } | ||
| }' | ||
| ``` | ||
|
|
||
| **Initialize:** | ||
| ```bash | ||
| curl -X POST http://localhost:4321/mcp \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "jsonrpc": "2.0", | ||
| "id": 1, | ||
| "method": "initialize", | ||
| "params": {} | ||
| }' | ||
| ``` | ||
|
|
||
| **List tools:** | ||
| ```bash | ||
| curl -X POST http://localhost:4321/mcp \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "jsonrpc": "2.0", | ||
| "id": 2, | ||
| "method": "tools/list", | ||
| "params": {} | ||
| }' | ||
| ``` | ||
|
|
||
| ## Limitations | ||
|
|
||
| - Streaming requests are not implemented. | ||
| - No authentication layer is provided – run the MCP port behind a reverse proxy if you need auth. | ||
|
|
||
| Contributions to extend MCP coverage (streaming, more tools, auth hooks) are welcome! | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,8 @@ | ||
| use anyhow::Result; | ||
| use clap::Parser; | ||
| use mistralrs_core::{initialize_logging, ModelSelected, TokenSource}; | ||
| use rust_mcp_sdk::schema::LATEST_PROTOCOL_VERSION; | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @EricLBuehler for my own curiosity, since there is a feature flag for version
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @matthewhaynesonline we will need to keep the version of |
||
| use tokio::join; | ||
| use tracing::info; | ||
|
|
||
| use mistralrs_server_core::{ | ||
|
|
@@ -10,6 +12,7 @@ use mistralrs_server_core::{ | |
|
|
||
| mod interactive_mode; | ||
| use interactive_mode::interactive_mode; | ||
| mod mcp_server; | ||
|
|
||
| #[derive(Parser)] | ||
| #[command(version, about, long_about = None)] | ||
|
|
@@ -24,7 +27,7 @@ struct Args { | |
|
|
||
| /// Port to serve on. | ||
| #[arg(short, long)] | ||
| port: Option<String>, | ||
| port: Option<u16>, | ||
|
|
||
| /// Log all responses and requests to this file | ||
| #[clap(long, short)] | ||
|
|
@@ -134,6 +137,10 @@ struct Args { | |
| /// Enable thinking for interactive mode and models that support it. | ||
| #[arg(long = "enable-thinking")] | ||
| enable_thinking: bool, | ||
|
|
||
| /// Port to serve MCP protocol on | ||
| #[arg(long)] | ||
| mcp_port: Option<u16>, | ||
| } | ||
|
|
||
| fn parse_token_source(s: &str) -> Result<TokenSource, String> { | ||
|
|
@@ -185,27 +192,54 @@ async fn main() -> Result<()> { | |
| return Ok(()); | ||
| } | ||
|
|
||
| // Needs to be after the .build call as that is where the daemon waits. | ||
| let setting_server = if !args.interactive_mode { | ||
| let port = args.port.expect("Interactive mode was not specified, so expected port to be specified. Perhaps you forgot `-i` or `--port`?"); | ||
| let ip = args.serve_ip.unwrap_or_else(|| "0.0.0.0".to_string()); | ||
| if !args.interactive_mode && args.port.is_none() && args.mcp_port.is_none() { | ||
| anyhow::bail!("Interactive mode was not specified, so expected port to be specified. Perhaps you forgot `-i` or `--port` or `--mcp-port`?") | ||
| } | ||
|
|
||
| // Create listener early to validate address before model loading | ||
| let listener = tokio::net::TcpListener::bind(format!("{ip}:{port}")).await?; | ||
| Some((listener, ip, port)) | ||
| let mcp_port = if let Some(port) = args.mcp_port { | ||
| let host = args | ||
| .serve_ip | ||
| .clone() | ||
| .unwrap_or_else(|| "0.0.0.0".to_string()); | ||
| info!("MCP server listening on http://{host}:{port}/mcp."); | ||
| info!("MCP protocol version is {}.", LATEST_PROTOCOL_VERSION); | ||
| let mcp_server = mcp_server::create_http_mcp_server(mistralrs.clone(), host, port); | ||
|
|
||
| tokio::spawn(async move { | ||
| if let Err(e) = mcp_server.await { | ||
| eprintln!("MCP server error: {e}"); | ||
| } | ||
| }) | ||
| } else { | ||
| None | ||
| tokio::spawn(async {}) | ||
| }; | ||
|
|
||
| let app = MistralRsServerRouterBuilder::new() | ||
| .with_mistralrs(mistralrs) | ||
| .build() | ||
| .await?; | ||
| let oai_port = if let Some(port) = args.port { | ||
| let ip = args | ||
| .serve_ip | ||
| .clone() | ||
| .unwrap_or_else(|| "0.0.0.0".to_string()); | ||
|
|
||
| if let Some((listener, ip, port)) = setting_server { | ||
| info!("Serving on http://{ip}:{}.", port); | ||
| axum::serve(listener, app).await?; | ||
| // Create listener early to validate address before model loading | ||
| let listener = tokio::net::TcpListener::bind(format!("{ip}:{port}")).await?; | ||
|
|
||
| let app = MistralRsServerRouterBuilder::new() | ||
| .with_mistralrs(mistralrs) | ||
| .build() | ||
| .await?; | ||
|
|
||
| info!("OpenAI-compatible server listening on http://{ip}:{port}."); | ||
|
|
||
| tokio::spawn(async move { | ||
| if let Err(e) = axum::serve(listener, app).await { | ||
| eprintln!("OpenAI server error: {e}"); | ||
| } | ||
| }) | ||
| } else { | ||
| tokio::spawn(async {}) | ||
| }; | ||
|
|
||
| let (_, _) = join!(oai_port, mcp_port); | ||
|
|
||
| Ok(()) | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Consider documenting the purpose of the date-based feature flag.
The feature flag
"2025_03_26"appears to be date-based, which could make future maintenance challenging. Consider adding a comment explaining what this feature represents or check if there's a more semantic feature name available.🌐 Web query:
💡 Result:
The "2025_03_26" feature flag in the
rust-mcp-sdkversion 0.4.2 likely corresponds to a specific version of the Model Context Protocol (MCP) schema. In the Rust MCP ecosystem, feature flags are often used to select different versions of the MCP schema, allowing developers to work with various schema versions as needed.For instance, the
rust-mcp-schemacrate provides multiple schema versions, including "2024_11_05" and "draft". To use a specific schema version, you can enable the corresponding feature in yourCargo.tomlfile. While the "2025_03_26" version isn't explicitly listed in the available versions, it's possible that this feature flag is intended to select that specific schema version.To utilize the "2025_03_26" schema version, you would typically add the following to your
Cargo.toml:This configuration instructs Cargo to compile the
rust-mcp-schemacrate with the "2025_03_26" feature enabled, thereby selecting that specific schema version.It's important to note that the availability of the "2025_03_26" schema version depends on its inclusion in the
rust-mcp-schemacrate. If this version isn't listed among the available features, it may not be supported or released yet. In such cases, you might need to consult the official MCP documentation or the maintainers of therust-mcp-schemacrate for further information.For more details on the
rust-mcp-schemacrate and its available schema versions, you can visit the GitHub repository: (github.com)Citations:
Document the MCP schema version feature flag
The feature flag
"2025_03_26"inrust-mcp-sdk = { version = "0.4.2", … }selects the Model Context Protocol schema v2025-03-26. To improve maintainability:rust-mcp-schemacrate exposes a2025_03_26feature; if it doesn’t, coordinate with its maintainers or choose an available schema version.🤖 Prompt for AI Agents