Building AI Agents with MCP: A Practical Guide
An AI agent is only as good as the tools it can use. MCP standardizes how agents connect to those tools — so you write the agent once and plug in capabilities at will. This guide walks through the full stack: how to think about the agent/MCP relationship, which servers to pick for common tasks, how to compose multi-server workflows, and how to keep everything running in production.
The mental model: agents and tools
An AI agent is a loop: observe the task, reason about what to do next, call a tool, incorporate the result, reason again. The LLM handles the reasoning. MCP servers handle the tools.
Before MCP, every agent framework invented its own tool format. LangChain tools looked different from AutoGen tools which looked different from custom Claude function-calling schemas. Integrations had to be rewritten for every framework. A developer who built a Jira integration for LangChain couldn't easily use it in Cursor or Claude Desktop.
MCP breaks that coupling. You build a server once. Any MCP-compatible client — Claude, Cursor, Cline, VS Code Copilot, your custom Python agent — can use it. The protocol handles tool discovery (the client asks "what can you do?"), schema communication (the server describes each tool's parameters), and execution (the client calls a tool with arguments, gets a result).
The practical consequence: the best tool for a job is the highest-quality MCP server for that capability, not the one that happens to have a library wrapper for your framework. That's what the AgentRank index helps you find.
Choosing your agent framework
The right framework depends on what you're building. Here's how the major options compare on MCP support:
| Framework | Language | MCP Support | Best For | Trade-offs |
|---|---|---|---|---|
| Claude (Anthropic API) | Any | Native | Direct API use, custom agent loops | Most flexible; requires building your own orchestration layer |
| LangChain | Python, TypeScript | Via langchain-mcp-adapters | Complex multi-step agents, existing LangChain codebases | Large framework with some overhead; great ecosystem |
| LlamaIndex | Python | Native MCP tool loading | RAG + agentic workflows, knowledge-heavy agents | Excellent for retrieval-augmented agents; heavier for pure action agents |
| Claude Code / Cursor / Windsurf | N/A (UI client) | Native (config-based) | Developer workflows, coding agents | No coding required; limited to the client's built-in agent capabilities |
| AG2 / AutoGen | Python | Via MCP adapter | Multi-agent workflows, agent-to-agent communication | Best for orchestrating multiple agents; more complex setup |
If you're just starting, pick the simplest path to working code: Claude Desktop or Cursor for zero-code agent setups, or the Anthropic Python SDK for programmatic control. Don't add LangChain or LlamaIndex unless you have a specific reason — the added abstraction has a learning cost.
The Anthropic SDK: minimal setup, full control
For developers who want to build MCP-powered agents without a framework, the Anthropic Python SDK handles tool call parsing and result injection natively. Your server connects via stdio; the SDK manages the protocol loop. This is the approach to understand first — it makes the other frameworks less magical.
Selecting the right MCP servers
Each server adds tools to your agent's context window. Too many tools degrades reasoning quality — the model has to consider more options for each step. In practice, 5–15 tools total per agent session is the sweet spot: enough capability without overwhelming the context.
Use the AgentRank score as a minimum bar. A score below 60 means the server is likely abandoned or unmaintained. Don't add it to a production agent — you'll be debugging broken tools under deadline pressure.
Here are the top-scored servers for common agent tasks:
| Task | Server | Score | Notes |
|---|---|---|---|
| Read and write files | modelcontextprotocol/servers (filesystem) | 88.1 | Official reference implementation — safe, sandboxed file access |
| GitHub operations | github/github-mcp-server | 94.2 | Official GitHub MCP — repos, issues, PRs, code search |
| Web search | tavily-ai/tavily-mcp | 83.7 | AI-optimized search with structured results — best for agent use |
| Database queries | mongodb-js/mongodb-mcp-server | 88.44 | Official MongoDB server — natural language to Atlas queries |
| Persistent memory | modelcontextprotocol/servers (memory) | 88.1 | Knowledge graph memory — agents remember across sessions |
| Browser automation | microsoft/playwright-mcp | 97.44 | Highest-scored server in the index — reliable web interaction |
| Team communication | korotovsky/slack-mcp-server | 83.01 | Send messages, read channels — no special Slack permissions required |
Your first MCP-powered agent
Let's build a minimal agent that uses MCP tools via the Anthropic Python SDK. This example queries GitHub for a repo's recent issues and summarizes them.
Step 1: Install dependencies
pip install anthropic mcp Step 2: Start the MCP server
We'll use the official GitHub MCP server. Set up your token first:
export GITHUB_PERSONAL_ACCESS_TOKEN=your_token_here
npx @modelcontextprotocol/server-github Step 3: Build the agent loop
import anthropic
import subprocess
import json
# Launch the MCP server as a subprocess
server_process = subprocess.Popen(
["npx", "@modelcontextprotocol/server-github"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
# In production, use the MCP client library to manage this properly.
# For illustration: use the Anthropic SDK's tool_use message format.
client = anthropic.Anthropic()
# The SDK handles tool definitions and result injection.
# See Anthropic docs for the full tool_use pattern.
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
tools=[{
"name": "list_issues",
"description": "List recent open issues for a GitHub repository",
"input_schema": {
"type": "object",
"properties": {
"owner": {"type": "string"},
"repo": {"type": "string"},
"limit": {"type": "integer", "default": 10},
},
"required": ["owner", "repo"],
},
}],
messages=[{
"role": "user",
"content": "Summarize the 5 most recent open issues in the jlowin/fastmcp repo",
}],
) For a production setup, use the official MCP Python SDK to manage server connections — it handles process lifecycle, protocol handshake, and tool listing automatically. The raw subprocess approach above illustrates the concept; the SDK is what you deploy.
Step 4: Handle the tool call loop
Claude returns a tool_use response block when it wants to call a tool.
Your agent loop routes that call to the MCP server, gets the result, and sends it back
as a tool_result message. The SDK automates this exchange when you use its
session management:
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
async def run_agent():
server_params = StdioServerParameters(
command="npx",
args=["@modelcontextprotocol/server-github"],
env={"GITHUB_PERSONAL_ACCESS_TOKEN": "your_token"},
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
tools = await session.list_tools()
# Pass tools to Claude, handle tool_use blocks,
# call session.call_tool() for each one. Multi-server composition
Most useful agents need more than one capability. An agent that helps with software development might need: GitHub (code access), a filesystem server (local edits), a search server (documentation lookup), and a database server (query production data).
Connecting multiple servers
In Claude Desktop and most IDE clients, you add multiple servers to the configuration JSON:
// ~/.config/claude/claude_desktop_config.json
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["@modelcontextprotocol/server-github"],
"env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "..." }
},
"filesystem": {
"command": "npx",
"args": ["@modelcontextprotocol/server-filesystem", "/Users/yourname/projects"]
},
"search": {
"command": "npx",
"args": ["tavily-mcp"],
"env": { "TAVILY_API_KEY": "..." }
}
}
}
In programmatic agents (Python/TypeScript), open parallel ClientSession
instances — one per server — and merge their tool lists into a single array before
passing to the LLM.
Tool namespacing to prevent collisions
When loading tools from multiple servers, namespace tool names by server to prevent
collisions. A GitHub server and a GitLab server both might define list_issues.
The MCP spec doesn't mandate namespacing — handle it in your agent layer:
def namespace_tools(server_name, tools):
return [
{**tool, "name": f"{server_name}_{tool['name']}"}
for tool in tools
] Keeping tool count manageable
Each tool consumes tokens in the model's context window for its description and schema. A server with 30 tools adds significant overhead. For complex workflows:
- Load only the servers relevant to the current task (don't preload everything)
- For servers with many tools, use a router agent that decides which sub-agent (and thus which server) to invoke
- Monitor which tools actually get called — unused tools are pure overhead
Designing for agent legibility
If you're building your own MCP server for your agent to use, the most important thing is not functionality — it's description quality. The agent reads tool descriptions to decide when and how to call each tool. A vague description leads to wrong calls; a precise description leads to correct ones.
What good tool descriptions look like
Poor: "search_documents" — searches documents
Good: "search_documents" — Full-text search across the product knowledge base.
Returns up to 10 matching documents with title, excerpt, and URL.
Use this when the user asks a question about product features, pricing, or documentation.
Do NOT use this for code search — use search_code instead."
The explicit "do NOT use this for" clause prevents the agent from calling the wrong tool when two tools have overlapping surface areas.
Return values: structure over verbosity
Tool return values go straight back into the agent's context. Structured data (JSON with clear field names) is better than prose. An agent can extract a specific field from structured data; it has to parse prose. Return only what the agent needs — excess data expands the context window without adding signal.
Testing MCP agent workflows
MCP Inspector for tool-level testing
Before testing the full agent loop, validate each tool in isolation:
npx @modelcontextprotocol/inspector Connect the Inspector to your server, call each tool with representative inputs, and verify the outputs look correct. Fix tool-level issues here before adding the LLM layer.
Tracing agent tool calls
The most useful test for an MCP agent is a trace of tool calls for a representative task. Run the task, log every tool call (name, input, output, latency), and review whether the agent called tools in a sensible order with correct inputs. Common failure patterns:
- Wrong tool selection: Agent calls
search_webwhen it should callsearch_docs— fix by clarifying descriptions - Hallucinated parameters: Agent passes a parameter that doesn't exist — fix by tightening the JSON schema
- Retry loops: Agent calls the same tool repeatedly — fix by improving error message structure in the tool's return value
Regression testing with golden traces
Once your agent handles a task correctly, save the tool call trace as a golden test. After making changes to tool descriptions or server code, replay the task and compare traces. Divergence means something broke.
Production considerations
Server reliability: use the AgentRank score as a filter
Before adding any MCP server to a production agent, check its score on the AgentRank checker. The minimum bar for production is a score of 65. Below that, the server's maintenance health is too uncertain for a workflow your users depend on. See the full evaluation guide for a complete checklist.
Graceful degradation when tools fail
MCP servers fail. APIs change, rate limits hit, network connections drop. Your agent should handle tool failures without crashing the whole session:
- Include in the system prompt: "If a tool returns an error, acknowledge it and explain what you would have done with the result, then continue."
- Implement server-level health checks at session start — don't begin a task if a required server is unavailable
- Log failures with server name and tool name to route alerts to the right place
Secrets management
Most MCP servers require API keys via environment variables. Never hardcode credentials in server config files. Use a secrets manager (AWS Secrets Manager, 1Password CLI, Vault) to inject them at runtime. The MCP security guide covers the full threat model.
Monitoring: what to track
In production, log and alert on:
- Tool call error rate by server and tool name
- p95 tool call latency — a sudden spike usually means an upstream API degraded
- Agent task completion rate — the proportion of sessions that reach a successful conclusion
- Token usage per session — rising usage can mean the agent is stuck in a retry loop
Track which servers are actually called in production. A server that never gets called is wasting context window tokens on its tool descriptions for every session. Remove it or consolidate it.
Get the weekly AgentRank digest
Top movers, new tools, ecosystem insights — straight to your inbox.