How to Connect Ollama to MCP (2026)
Use local Ollama models with the Model Context Protocol
Start Building with Hypereal
Access Kling, Flux, Sora, Veo & more through a single API. Free credits to start, scale to millions.
No credit card required • 100k+ developers • Enterprise ready
How to Connect Ollama to MCP (2026)
MCP (Model Context Protocol) lets AI models interact with external tools and data sources. Ollama lets you run powerful language models locally. Connecting the two gives you a locally-hosted AI assistant that can access files, query databases, call APIs, and use any MCP-compatible tool -- all without sending data to cloud providers.
This guide shows you how to bridge Ollama models with MCP servers, covering multiple approaches from simple to advanced.
Why Connect Ollama to MCP?
By default, Ollama models can only respond based on the conversation context you provide. They cannot access files, search the web, or interact with external systems. MCP changes this by giving the model access to tools it can call during a conversation.
Use cases for Ollama + MCP:
- Local file access: Let the model read and search your codebase or documents
- Database queries: The model can query SQLite, PostgreSQL, or other databases
- API integration: Connect to GitHub, Jira, Slack, or any service with an MCP server
- Web search: Add search capabilities to your local model
- Complete privacy: All processing stays on your machine
Prerequisites
| Requirement | Details |
|---|---|
| Ollama | Installed and running (ollama.com) |
| Node.js | v18+ (for MCP servers) |
| A model with tool support | Qwen 2.5, Llama 3.3, Hermes 3, or Mistral |
| RAM | 8 GB minimum, 16+ GB recommended |
Install Ollama (if needed)
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model with good tool-calling support
ollama pull qwen2.5:7b
Verify Ollama is running:
curl http://localhost:11434/api/tags
Approach 1: MCP Client Bridge (Recommended)
The most practical approach is to use a bridge application that acts as an MCP client, connecting Ollama's API to MCP servers.
Step 1: Install the mcp-client-cli Bridge
npm install -g @anthropic/mcp-client-cli
This tool creates an interactive chat interface that connects an LLM to MCP servers.
Step 2: Create a Configuration File
Create mcp-config.json:
{
"llm": {
"provider": "openai-compatible",
"baseUrl": "http://localhost:11434/v1",
"apiKey": "ollama",
"model": "qwen2.5:7b"
},
"mcpServers": {
"filesystem": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"/Users/yourname/projects"
]
},
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_TOKEN": "ghp_your_token"
}
}
}
}
Step 3: Start the Bridge
mcp-client-cli --config mcp-config.json
This opens an interactive chat where your Ollama model can use the configured MCP tools. Ask it to list files, read code, or interact with GitHub -- the model will call the appropriate MCP tools automatically.
Approach 2: Python Script with Tool Calling
For full control, build a Python script that handles the MCP protocol and routes tool calls to Ollama.
Step 1: Install Dependencies
pip install openai mcp
Step 2: Create the Bridge Script
import asyncio
import json
from openai import OpenAI
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
# Connect to Ollama
llm = OpenAI(
api_key="ollama",
base_url="http://localhost:11434/v1"
)
MODEL = "qwen2.5:7b"
async def run():
# Start MCP server (filesystem access)
server_params = StdioServerParameters(
command="npx",
args=["-y", "@modelcontextprotocol/server-filesystem", "/Users/yourname/projects"]
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Initialize MCP session
await session.initialize()
# Get available tools from MCP server
tools_response = await session.list_tools()
mcp_tools = tools_response.tools
# Convert MCP tools to OpenAI function format
openai_tools = []
for tool in mcp_tools:
openai_tools.append({
"type": "function",
"function": {
"name": tool.name,
"description": tool.description or "",
"parameters": tool.inputSchema if tool.inputSchema else {"type": "object", "properties": {}}
}
})
print(f"Connected to MCP server with {len(openai_tools)} tools")
print("Tools:", [t["function"]["name"] for t in openai_tools])
print("\nChat started. Type 'quit' to exit.\n")
messages = []
while True:
user_input = input("You: ")
if user_input.lower() in ("quit", "exit"):
break
messages.append({"role": "user", "content": user_input})
# Call Ollama with tools
response = llm.chat.completions.create(
model=MODEL,
messages=messages,
tools=openai_tools if openai_tools else None,
tool_choice="auto"
)
assistant_message = response.choices[0].message
# Handle tool calls
if assistant_message.tool_calls:
messages.append(assistant_message)
for tool_call in assistant_message.tool_calls:
tool_name = tool_call.function.name
tool_args = json.loads(tool_call.function.arguments)
print(f" [Calling tool: {tool_name}({tool_args})]")
# Execute via MCP
result = await session.call_tool(tool_name, tool_args)
# Add tool result to conversation
tool_result_text = ""
for content in result.content:
if hasattr(content, "text"):
tool_result_text += content.text
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": tool_result_text
})
# Get final response after tool execution
final_response = llm.chat.completions.create(
model=MODEL,
messages=messages
)
final_text = final_response.choices[0].message.content
messages.append({"role": "assistant", "content": final_text})
print(f"AI: {final_text}\n")
else:
content = assistant_message.content or ""
messages.append({"role": "assistant", "content": content})
print(f"AI: {content}\n")
asyncio.run(run())
Step 3: Run It
python ollama_mcp_bridge.py
Example session:
Connected to MCP server with 5 tools
Tools: ['read_file', 'write_file', 'list_directory', 'search_files', 'get_file_info']
You: List all TypeScript files in my projects directory
[Calling tool: list_directory({"path": "/Users/yourname/projects"})]
[Calling tool: search_files({"pattern": "*.ts", "path": "/Users/yourname/projects"})]
AI: I found 23 TypeScript files in your projects directory...
You: Read the main index.ts file and suggest improvements
[Calling tool: read_file({"path": "/Users/yourname/projects/my-app/src/index.ts"})]
AI: Here are my suggestions for improving your index.ts...
Approach 3: Open WebUI with MCP Support
Open WebUI is a popular web interface for Ollama that has added MCP support. This gives you a ChatGPT-like interface powered by local models with MCP tool access.
Step 1: Install Open WebUI
docker run -d -p 3000:8080 \
-v open-webui:/app/backend/data \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Step 2: Configure MCP Servers
In Open WebUI's admin settings:
- Navigate to Admin > Settings > MCP
- Add your MCP server configurations
- Enable tool calling for your selected model
Step 3: Use It
Open http://localhost:3000 in your browser. Select an Ollama model that supports tool calling and start chatting. The model can now use the configured MCP tools through the web interface.
Best Models for MCP Tool Calling
Not all Ollama models handle tool calling well. Here are the most reliable options:
| Model | Size (Q4) | Tool Calling | Reliability | Speed |
|---|---|---|---|---|
| Qwen 2.5 7B Instruct | 4.5 GB | Excellent | High | Fast |
| Hermes 3 8B | 5 GB | Excellent | High | Fast |
| Llama 3.3 8B Instruct | 5 GB | Very Good | High | Fast |
| Mistral Small 24B | 14 GB | Good | Medium | Medium |
| Qwen 2.5 72B Instruct | 42 GB | Excellent | Very High | Slow |
| Command R+ 104B | 60 GB | Very Good | High | Slow |
Recommendation: Start with qwen2.5:7b for the best balance of tool-calling reliability and performance. Upgrade to qwen2.5:72b if you have the VRAM for it.
# Pull the recommended model
ollama pull qwen2.5:7b
# Or for better quality with more VRAM
ollama pull qwen2.5:72b-instruct-q4_K_M
Useful MCP Servers to Connect
Here are the most practical MCP servers to pair with Ollama:
| MCP Server | Package | Use Case |
|---|---|---|
| Filesystem | @modelcontextprotocol/server-filesystem |
Read/write local files |
| GitHub | @modelcontextprotocol/server-github |
Issues, PRs, repos |
| SQLite | @modelcontextprotocol/server-sqlite |
Query local databases |
| Brave Search | @anthropic/mcp-server-brave-search |
Web search |
| Fetch | @modelcontextprotocol/server-fetch |
Fetch web page content |
| Memory | @modelcontextprotocol/server-memory |
Persistent memory across sessions |
Install and use any of them with:
npx -y @modelcontextprotocol/server-filesystem /path/to/directory
Troubleshooting
"Model does not support tool calling" Not all Ollama models support the tool-calling format. Use Qwen 2.5, Hermes 3, or Llama 3.3 Instruct. Avoid base models and models without "instruct" in the name.
Tool calls return malformed JSON Lower the temperature to 0.1 in your Ollama configuration. Higher temperatures cause the model to generate invalid JSON for tool call arguments.
response = llm.chat.completions.create(
model=MODEL,
messages=messages,
tools=openai_tools,
temperature=0.1 # Low temperature for reliable tool calling
)
MCP server fails to start Make sure Node.js v18+ is installed and the MCP server package installs correctly. Test by running the npx command manually in your terminal.
Ollama not responding on port 11434 Start the Ollama server with:
ollama serve
Or check if it is running:
curl http://localhost:11434/api/tags
Slow responses with tool calling
Tool calling adds latency because the model generates a tool call, the tool executes, and then the model processes the result. Use a smaller, faster model (7B) and ensure your GPU is being utilized. Check with nvidia-smi or ollama ps.
Wrapping Up
Connecting Ollama to MCP gives you a completely local, privacy-preserving AI assistant that can interact with your files, databases, and services. The setup ranges from a 5-minute configuration with a pre-built bridge to a fully custom Python integration. The key to success is choosing a model with strong tool-calling support and keeping inference parameters conservative.
If your workflows also involve AI-generated media like images, video, or talking avatars, check out Hypereal AI for a unified API that handles all major AI media models.
Try Hypereal AI free -- 35 credits, no credit card required.
Related Articles
Start Building Today
Get 35 free credits on signup. No credit card required. Generate your first image in under 5 minutes.
