modelscope/MCPBench

Score: 22.4 Rank #10062

The evaluation benchmark on MCP servers

Overview

modelscope/MCPBench is a Python MCP server licensed under Apache-2.0. The evaluation benchmark on MCP servers Topics: benchmark, database, mcp, mcp-server, websearch.

Ranked #10062 out of 25632 indexed tools.

Ecosystem

Python Apache-2.0

benchmarkdatabasemcpmcp-serverwebsearch

Signal Breakdown

Stars 241

Freshness 6mo ago

Issue Health 14%

Contributors 3

Dependents 0

Forks 15

Description Brief

License Apache-2.0

How to Improve

Description low impact

Expand your description to 150+ characters for better discoverability

Freshness high impact

Last commit was 194 days ago — a recent commit would boost your freshness score

Issue Health high impact

You have 6 open vs 1 closed issues — triaging stale issues improves health

Badge

Markdown

[![AgentRank](https://agentrank-ai.com/api/badge/tool/modelscope--MCPBench)](https://agentrank-ai.com/tool/modelscope--MCPBench)

HTML

<a href="https://agentrank-ai.com/tool/modelscope--MCPBench"><img src="https://agentrank-ai.com/api/badge/tool/modelscope--MCPBench" alt="AgentRank"></a>

Details

Stars: 241
Forks: 15
Open Issues: 6
Closed Issues: 1
Contributors: 3
Dependents: —
Language: Python
License: Apache-2.0
Last Commit: 6mo ago

Matched Queries

"mcp server""mcp-server"

From the README

<h1 align="center">
	🦊 MCPBench: A Benchmark for Evaluating MCP Servers
</h1>

<div align="center">

[![Documentation][docs-image]][docs-url]
[![Package License][package-license-image]][package-license-url]

</div>

<div align="center">
<h4 align="center">

[中文](https://github.com/modelscope/MCPBench/blob/main/README_zh.md) |
[English](https://github.com/modelscope/MCPBench/blob/main/README.md)

</h4>
</div>

MCPBench is an evaluation framework for MCP Servers. It supports the evaluation of three types of servers: Web Search, Database Query and GAIA, and is compatible with both local and remote MCP Servers. The framework primarily evaluates different MCP Servers (such as Brave Search, DuckDuckGo, etc.) in terms of task completion accuracy, latency, and token consumption under the same LLM and Agent configurations. Here is the [evaluation report](https://arxiv.org/abs/2504.11094).

> The implementation refers to [LangProBe: a Language Programs Benchmark](https://arxiv.org/abs/2502.2031

Read full README on GitHub →

View on GitHub →

Are you the maintainer? Claim this listing