Accenture/mcp-bench

Score: 23.9 Rank #9118

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Overview

Accenture/mcp-bench is a Python MCP server. MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Ranked #9118 out of 25632 indexed tools.

Ecosystem

Python No license

Signal Breakdown

Stars 457

Freshness 5mo ago

Issue Health 23%

Contributors 1

Dependents 0

Forks 54

Description Good

License None

How to Improve

Description low impact

Expand your description to 150+ characters for better discoverability

License low impact

Add an MIT or Apache-2.0 license to signal trust and enable adoption

Freshness high impact

Last commit was 160 days ago — a recent commit would boost your freshness score

Badge

Markdown

[![AgentRank](https://agentrank-ai.com/api/badge/tool/Accenture--mcp-bench)](https://agentrank-ai.com/tool/Accenture--mcp-bench)

HTML

<a href="https://agentrank-ai.com/tool/Accenture--mcp-bench"><img src="https://agentrank-ai.com/api/badge/tool/Accenture--mcp-bench" alt="AgentRank"></a>

Details

Stars: 457
Forks: 54
Open Issues: 23
Closed Issues: 7
Contributors: 1
Dependents: —
Language: Python
License: —
Last Commit: 5mo ago

Matched Queries

"mcp server""mcp-server"

From the README

# MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

## Overview

MCP-Bench is a comprehensive evaluation framework designed to assess Large Language Models' (LLMs) capabilities in tool-use scenarios through the Model Context Protocol (MCP). This benchmark provides an end-to-end pipeline for evaluating how effectively different LLMs can discover, select, and utilize tools to solve real-world tasks.

## News

* [2025-09] MCP-Bench is accepted to NeurIPS 2025 Workshop on Scaling Environments for Agents.

## Leaderboard

| Rank | Model | Overall Score |
|------|-------|---------------|
| 1 | gpt-5 | 0.749 |
| 2 | o3 | 0.715 |
| 3 | gpt-oss-120b | 0.692 |
| 4 | gemini-2.5-pro | 0.690 |
| 5 | claude-sonnet-4 | 0.681 |
| 6 | qwen3-235b-a22b-2507 | 0.678 |
| 7 | glm-4.5 | 0.668 |
| 8 | gpt-oss-20b | 0.654 |
| 9 | kimi-k2 | 0.629 |
| 10 | qwen3-30b-a3b-instruct-2507 | 0.627 |
| 11 | gemini-2.5-flash-lite | 0.598 |
| 12 | gpt-4o | 0.595 |
| 13 | gemma-3

Read full README on GitHub →

View on GitHub →

Are you the maintainer? Claim this listing