Local AI with Retrieval Augmented Generation (RAG) using Open WebUI and Ollama
Sources:
- Build Your Local AI: From Zero to a Custom ChatGPT Interface with Ollama & Open WebUI | by Peter Alexandru Hautelman | Mar, 2025 | Medium
- Open WebUI tutorial — Supercharge Your Local AI with RAG and Custom Knowledge Bases | by Peter Alexandru Hautelman | Mar, 2025 | Medium
- Open WebUI Documentation
Running artificial intelligence locally on your computer
Section titled “Running artificial intelligence locally on your computer”Run a local a large language model (LLM) on your local computer to have benefits of save costs, privacy, and get more features and customization. Disadvantages is needing a computer that can run the models, technical knowledge, and time.
Tools being used
Section titled “Tools being used”- Ollama - framework to run LLMs locally
- OpenWebUI - user interface
to interact with Ollama, it also support talking to other interfaces
- Has things like chat, model configuration, retrieval augmental generation (RAG), and other features
- Python - programming language
Hardware Requirements
Section titled “Hardware Requirements”A computer with at least a graphics processing unit (GPU) with minimum 4GB GPU memory, 8GB+ recommended, 20 GB free disk space, and a browser
Setting up AI Locally with ollama, open-webui and Python
Section titled “Setting up AI Locally with ollama, open-webui and Python”For detailed steps, see the source articles. High level steps are:
- Install the tools
- Set up a Python virtual environment
- Use Ollama’s online list of models to select an appropriate model.
Here are examples: AI Models - AI
Models
- Can start with a smaller model of 7B parameters or less
- Run ollama models using ollama Snippets - Snippets ollama
- Install open-webui in the Python virtual environment and start it, then access the web application user interface in your browser
Assuming Python, Ollama, and Python Environment and package tools like pip are installed:
# Create and activate a Python virtual environment, assume use python 3.11 or higherpython -m virtualenv ./venvsource ./venv/bin/activate
# Run a small model and exit out of the chat interface with Ctrl + Dollama run phi4-mini
# Install and run open-webuipip install open-webuiopen-webui serve
Go to the localhost site for Open WebUI to see interface for chat, models, chat controls, and settings.
Configuring open-webui setup for search, code use, and models
Section titled “Configuring open-webui setup for search, code use, and models”- Web search: allows using web search results
- Settings > Admin > Web Search, choose and set up API
- Code Interpreter: allows writing and execution of Python code in chat,
changes from chat to AI being able to do work with a programming
language
- Settings > Admin > Code Execution, toggle switch
- Advantages: Instead of just text chat, AI can run programs to help solve problems
- Custom model: customize how an AI model behaves
- Workspace > Models > Create New
- Set things like system prompt
Using Local artificial intelligence (AI) with Retrieval Augmented Generation (RAG) and Existing Documents
Section titled “Using Local artificial intelligence (AI) with Retrieval Augmented Generation (RAG) and Existing Documents”See Local AI with Retrieval Augmented Generation (RAG) using Open WebUI and Ollama System and Setup - Local AI with Retrieval Augmented Generation (RAG) using Open WebUI and Ollama System and Setup