LLM Benchmark MCP Server

MCP server that gives AI agents access to LLM benchmark data, pricing comparisons, and model recommendations.

Features

compare_models — Side-by-side benchmark comparison of LLMs (MMLU, HumanEval, MATH, GPQA, ARC, HellaSwag)
get_model_details — Detailed info about a specific model including strengths/weaknesses
recommend_model — Get the best model recommendation for your task and budget
list_top_models — Top models ranked by category (coding, math, reasoning, chat)
get_pricing — Pricing comparison via OpenRouter API

Supported Models

GPT-4o, GPT-4o-mini, GPT-4 Turbo, o1, o3-mini, Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus, Gemini 2.0 Flash, Gemini 2.0 Pro, Gemini 1.5 Pro, Llama 3.1 (8B/70B/405B), Llama 3.3 70B, Mistral Large, Mistral Small, Mixtral 8x22B, DeepSeek V3, DeepSeek R1, Qwen 2.5 72B

Installation

pip install llm-benchmark-mcp-server

Usage with Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "llm-benchmark": {
      "command": "benchmark-server"
    }
  }
}

Or via uvx (no install needed):

{
  "mcpServers": {
    "llm-benchmark": {
      "command": "uvx",
      "args": ["llm-benchmark-mcp-server"]
    }
  }
}

Example Queries

"Compare GPT-4o vs Claude 3.5 Sonnet vs Gemini 2.0 Pro"
"Which model is best for coding on a low budget?"
"Show me the top 10 models for math"
"What does GPT-4o cost compared to Claude?"
"Give me details about DeepSeek R1"

Data Sources

Benchmarks: Hardcoded from official papers and public leaderboards (MMLU, HumanEval, MATH, GPQA, ARC-Challenge, HellaSwag)
Pricing: Live data from OpenRouter API
Arena Rankings: Chatbot Arena Leaderboard (when available)

More MCP Servers by AiAgentKarl

Category	Servers
🔗 Blockchain	Solana
🌍 Data	Weather · Germany · Agriculture · Space · Aviation · EU Companies
🔒 Security	Cybersecurity · Policy Gateway · Audit Trail
🤖 Agent Infra	Memory · Directory · Hub · Reputation
🔬 Research	Academic · LLM Benchmark · Legal

→ Full catalog (40+ servers)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Benchmark MCP Server

Features

Supported Models

Installation

Usage with Claude Desktop

Example Queries

Data Sources

More MCP Servers by AiAgentKarl

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Benchmark MCP Server

Features

Supported Models

Installation

Usage with Claude Desktop

Example Queries

Data Sources

More MCP Servers by AiAgentKarl

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages