Accenture/mcp-bench
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Overview
Accenture/mcp-bench is a Python MCP server. MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
Ranked #9118 out of 25632 indexed tools.
Ecosystem
Python No license
Signal Breakdown
Stars 457
Freshness 5mo ago
Issue Health 23%
Contributors 1
Dependents 0
Forks 54
Description Good
License None
How to Improve
Description low impact
License low impact
Freshness high impact
Matched Queries
From the README
# MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers ## Overview MCP-Bench is a comprehensive evaluation framework designed to assess Large Language Models' (LLMs) capabilities in tool-use scenarios through the Model Context Protocol (MCP). This benchmark provides an end-to-end pipeline for evaluating how effectively different LLMs can discover, select, and utilize tools to solve real-world tasks. ## News * [2025-09] MCP-Bench is accepted to NeurIPS 2025 Workshop on Scaling Environments for Agents. ## Leaderboard | Rank | Model | Overall Score | |------|-------|---------------| | 1 | gpt-5 | 0.749 | | 2 | o3 | 0.715 | | 3 | gpt-oss-120b | 0.692 | | 4 | gemini-2.5-pro | 0.690 | | 5 | claude-sonnet-4 | 0.681 | | 6 | qwen3-235b-a22b-2507 | 0.678 | | 7 | glm-4.5 | 0.668 | | 8 | gpt-oss-20b | 0.654 | | 9 | kimi-k2 | 0.629 | | 10 | qwen3-30b-a3b-instruct-2507 | 0.627 | | 11 | gemini-2.5-flash-lite | 0.598 | | 12 | gpt-4o | 0.595 | | 13 | gemma-3Read full README on GitHub →
Are you the maintainer? Claim this listing