How to Set Up Open WebUI with Ollama (2026)

Open WebUI is an open-source, self-hosted web interface for interacting with large language models. Paired with Ollama -- a tool for running LLMs locally -- it gives you a ChatGPT-like experience that runs entirely on your own hardware. No API keys, no subscription fees, no data leaving your machine.

This guide walks you through the complete setup, from installing Ollama to configuring Open WebUI with advanced features.

Why Open WebUI + Ollama?

Feature	ChatGPT	Open WebUI + Ollama
Cost	$20-200/month	Free (your hardware)
Privacy	Data sent to OpenAI	Everything stays local
Internet required	Yes	No (after setup)
Model choice	GPT-4o, o1 only	Any open-source model
Customization	Limited	Full control
Rate limits	Yes	No
Multi-user	No (per account)	Yes (built-in)

Prerequisites

Hardware: A computer with at least 8GB RAM. For good performance with larger models, 16GB+ RAM and a GPU with 8GB+ VRAM is recommended.
OS: macOS, Linux, or Windows (WSL2 for Docker).
Docker: Required for Open WebUI. Install from docker.com.

Step 1: Install Ollama

Ollama is the backend that downloads and runs AI models locally.

macOS

# Download and install from the website
# Or use Homebrew:
brew install ollama

Linux

curl -fsSL https://ollama.ai/install.sh | sh

Windows

Download the installer from ollama.com/download.

Verify installation

ollama --version
# Should output: ollama version 0.x.x

Step 2: Download Your First Model

Pull a model before setting up the UI:

# Recommended starter model (good balance of quality and speed)
ollama pull llama3.1:8b

# For more capable responses (needs 16GB+ RAM)
ollama pull llama3.3:70b

# For coding
ollama pull qwen2.5-coder:14b

# For fast, lightweight use
ollama pull phi4-mini

Model size guide

Model	RAM Needed	VRAM Needed	Quality
phi4-mini (3.8B)	4GB	3GB	Good for simple tasks
llama3.1:8b	8GB	6GB	Good general purpose
qwen2.5-coder:14b	12GB	10GB	Great for coding
llama3.3:70b	48GB	40GB	Excellent all-around
deepseek-v3 (quantized)	32GB+	24GB+	Top-tier reasoning

Test your model:

ollama run llama3.1:8b "What is the capital of France?"

If you get a response, Ollama is working correctly.

Step 3: Install Open WebUI with Docker

The easiest way to run Open WebUI is with Docker. One command does everything:

docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

This command:

Runs Open WebUI on port 3000.
Connects to your local Ollama instance automatically.
Persists data (chats, settings, users) in a Docker volume.
Restarts automatically if your computer reboots.

Alternative: Docker Compose

For more control, use a docker-compose.yml file:

version: "3.8"
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    ports:
      - "3000:8080"
    volumes:
      - open-webui:/app/backend/data
    extra_hosts:
      - "host.docker.internal:host-gateway"
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
      - WEBUI_AUTH=true
      - WEBUI_SECRET_KEY=your-secret-key-here
    restart: always

volumes:
  open-webui:

Start it:

docker compose up -d

Step 4: Initial Configuration

Open your browser and go to http://localhost:3000.
Create an admin account. The first user to register becomes the administrator.
You should see your Ollama models listed in the model selector dropdown.

If models are not appearing:

Verify Ollama is running: ollama list
Check the Ollama URL in Open WebUI: Go to Admin Panel > Settings > Connections and confirm the URL is http://host.docker.internal:11434.

Step 5: Pull Models from the UI

You can download new models directly from Open WebUI:

Go to Admin Panel > Settings > Models.
Enter a model name (e.g., qwen2.5:14b) in the "Pull a model" field.
Click the download button.
Wait for the download to complete. Progress is shown in the UI.

Step 6: Configure Advanced Features

Enable Web Search

Open WebUI supports web search via several providers:

Go to Admin Panel > Settings > Web Search.
Enable web search.
Choose a search engine (SearXNG for self-hosted, Google, Brave, etc.).
Add your API key if required.

For a fully self-hosted solution, deploy SearXNG alongside Open WebUI:

# Add to docker-compose.yml
searxng:
  image: searxng/searxng:latest
  container_name: searxng
  ports:
    - "8888:8080"
  volumes:
    - ./searxng:/etc/searxng
  restart: always

Then set the search URL in Open WebUI to http://searxng:8080.

Enable RAG (Document Chat)

Open WebUI has built-in RAG capabilities:

In any chat, click the + button and upload a document (PDF, TXT, DOCX, etc.).
Open WebUI will chunk, embed, and index the document.
Ask questions about the document content.

For the embedding model, go to Admin Panel > Settings > Documents and configure:

Embedding model: nomic-embed-text (pull it via Ollama first)
Chunk size: 1000 (default is fine for most use cases)
Chunk overlap: 200

# Pull the embedding model
ollama pull nomic-embed-text

Enable Image Generation

Connect Open WebUI to a local Stable Diffusion or DALL-E instance:

Go to Admin Panel > Settings > Images.
Choose your backend (Automatic1111, ComfyUI, or OpenAI-compatible).
Enter the API URL (e.g., http://host.docker.internal:7860 for Automatic1111).

Multi-User Setup

Open WebUI supports multiple users with role-based access:

Go to Admin Panel > Users.
Set the default role for new signups (user, pending, or admin).
Manage individual user permissions.
Each user gets their own chat history and settings.

This makes it perfect for teams, families, or classroom environments.

Step 7: Connect External APIs (Optional)

Open WebUI can also connect to remote APIs alongside Ollama:

OpenAI API

Go to Admin Panel > Settings > Connections.
Under "OpenAI API," add your API key.
Models like GPT-4o will appear in the model selector alongside local Ollama models.

Any OpenAI-Compatible API

You can add any provider that uses the OpenAI format:

URL: https://api.groq.com/openai/v1
Key: your-groq-api-key

This lets you mix local models (via Ollama) and remote models (via APIs) in the same interface.

Performance Optimization

GPU Acceleration

Ensure Ollama is using your GPU:

# Check if GPU is detected
ollama run llama3.1:8b --verbose
# Look for "GPU" in the output

For NVIDIA GPUs, install the NVIDIA Container Toolkit for Docker GPU passthrough:

# Ubuntu/Debian
sudo apt-get install nvidia-container-toolkit
sudo systemctl restart docker

Memory Management

If you are running out of memory:

# Use quantized models (smaller, slightly lower quality)
ollama pull llama3.1:8b-q4_0    # 4-bit quantization, ~4GB
ollama pull llama3.1:8b-q8_0    # 8-bit quantization, ~8GB

Context Length

To increase context length for a model, create a custom Modelfile:

# Create a Modelfile
cat > Modelfile << 'EOF'
FROM llama3.1:8b
PARAMETER num_ctx 16384
PARAMETER temperature 0.7
SYSTEM You are a helpful coding assistant.
EOF

# Create the custom model
ollama create llama3.1-16k -f Modelfile

Troubleshooting

Issue	Solution
"Ollama not connected"	Ensure Ollama is running (`ollama serve`). Check the connection URL in settings.
Models not loading	Check RAM. Use `ollama ps` to see running models.
Slow responses	Use a smaller model or enable GPU acceleration.
Docker permission denied	Add your user to the docker group: `sudo usermod -aG docker $USER`
Chat history lost	Ensure the Docker volume is persistent (`-v open-webui:/app/backend/data`).

Wrapping Up

Open WebUI with Ollama gives you a fully private, customizable ChatGPT alternative running on your own hardware. The setup takes about 15 minutes, and once running, you have unlimited access to powerful AI models with no subscription fees, no rate limits, and no data privacy concerns.

If you need AI-generated media capabilities that go beyond text -- like image generation, video creation, or talking avatars -- try Hypereal AI free -- 35 credits, no credit card required. It complements your local LLM setup with cloud-powered media generation via a simple API.