How to Build Your Own Offline AI Application: A Complete Beginner's Guide
Learn how to build a real AI-powered application that runs 100% on your own computer — no internet, no API keys, no subscription. Step-by-step Python and browser examples using Ollama, written for IT admins and non-developers alike.
You have already heard about offline AI — running AI models on your own computer without sending data to the cloud. But there is a big difference between using a local AI through a chat window and actually building your own application on top of it.
This guide bridges that gap.
By the end of this article, you will have built two real, working offline AI applications from scratch:
- A Python command-line assistant — ask it anything, get an AI reply, all local
- A browser-based chat interface — a simple web page that looks like a proper AI chat app
No prior programming experience required. Every piece of code is explained line by line in plain English. If you can copy and paste, you can build these.
The Big Picture: How an Offline AI App Actually Works
Before writing a single line of code, let us understand what is actually happening.
Think of it like a restaurant kitchen:
- The AI model is the chef — it knows how to cook (generate answers)
- Ollama is the restaurant manager — it organises everything so the chef is ready to take orders
- Your application is the waiter — it takes your order (your question) and brings the food back (the AI's answer)
When you install Ollama on your computer, it runs quietly in the background as a local server — similar to a tiny website running on your own machine. It listens at the address http://localhost:11434 and waits for requests.
Your application sends a message to that address, Ollama passes it to the AI model, the model generates a reply, and the reply comes back to your app.
Your App → http://localhost:11434 → Ollama → AI Model → Answer → Your AppThat is the entire architecture. Everything stays on your machine. Nothing touches the internet after the one-time model download.
What You Need Before You Start
You need three things:
| What | Why | Where to get it |
|---|---|---|
| Ollama | Runs the AI model locally | ollama.com |
| Python 3.10+ | To write and run your app | python.org |
| A text editor | To write your code | VS Code (recommended), Notepad++, or even Notepad |
That is it. No cloud accounts. No API keys. No credit card.
Check if Python is already installed
Open a command prompt (Windows: press Win + R, type cmd, press Enter) and type python --version. If you see a version number like Python 3.11.2, you already have Python and can skip the Python install step.
Install and Start Ollama
Download Ollama
Go to ollama.com and click Download. Choose Windows, macOS, or Linux. Install it like any normal application.
Download an AI Model
Open a command prompt and run this command. It downloads the Llama 3.2 model (about 2 GB) — a capable, fast model that runs well on most computers.
ollama pull llama3.2Wait for the download to finish. You only do this once.
Verify Ollama is Running
Open your browser and go to http://localhost:11434. If you see the text Ollama is running, you are ready. Ollama starts automatically in the background when you install it.
Part 1: Your First Offline AI App — Python Command Line
We will build a simple Python script that lets you type a question and get an AI reply. Twenty lines of code. Let us go through it piece by piece so you understand every line.
The Complete Script
Create a new file called ai_chat.py and paste this code:
import requests
import json
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL_NAME = "llama3.2"
def ask_ai(question):
payload = {
"model": MODEL_NAME,
"prompt": question,
"stream": False
}
response = requests.post(OLLAMA_URL, json=payload)
result = response.json()
return result["response"]
print("Offline AI Assistant — type 'quit' to exit\n")
while True:
user_input = input("You: ")
if user_input.lower() == "quit":
break
answer = ask_ai(user_input)
print(f"\nAI: {answer}\n")Line-by-Line Explanation
Lines 1–2 — Import libraries
import requests
import jsonrequests is a Python library that lets your script send messages over the internet (or in this case, to Ollama running locally). json handles converting data to the format Ollama expects.
Lines 4–5 — Set the address and model
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL_NAME = "llama3.2"This tells Python where to send the question (the Ollama address) and which AI model to use. If you later download a different model, you only need to change the model name here.
Lines 7–13 — The ask_ai function
def ask_ai(question):
payload = {
"model": MODEL_NAME,
"prompt": question,
"stream": False
}
response = requests.post(OLLAMA_URL, json=payload)
result = response.json()
return result["response"]This is the main engine. It bundles your question into a payload (a small package of information), sends it to Ollama, waits for the reply, and returns the AI's answer text. stream: False means we wait for the full reply before displaying it — simpler for beginners.
Lines 15–21 — The chat loop
print("Offline AI Assistant — type 'quit' to exit\n")
while True:
user_input = input("You: ")
if user_input.lower() == "quit":
break
answer = ask_ai(user_input)
print(f"\nAI: {answer}\n")This runs the chat. while True keeps the program running until you type quit. It takes your input, passes it to the ask_ai function, and prints the reply.
Install the Requests Library
Before running, you need to install the requests library. Open your command prompt and run:
pip install requestsRun Your App
In your command prompt, navigate to the folder where you saved ai_chat.py and run:
python ai_chat.pyYou will see:
Offline AI Assistant — type 'quit' to exit
You: What is the capital of France?
AI: The capital of France is Paris. It is the largest city in the country
and serves as the political, cultural, and commercial centre of France...
You:That is your first offline AI application running on your own computer. No internet after the model download. No data leaving your machine.
Change the model in one line
Try replacing llama3.2 with mistral or phi4-mini (after running ollama pull mistral) and run the script again. Different models have different personalities and strengths.
Part 2: A Browser-Based Offline AI Chat Interface
Now let us build something that looks like a real chat application — a webpage with a text box, a Send button, and a conversation history. No frameworks, no Node.js, no build tools. Just one HTML file.
The Complete HTML File
Create a new file called ai_chat.html and paste this:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>My Offline AI Chat</title>
<style>
body { font-family: Arial, sans-serif; max-width: 700px; margin: 40px auto; padding: 0 20px; background: #f5f5f5; }
h1 { color: #1e40af; font-size: 1.4rem; margin-bottom: 4px; }
p.subtitle { color: #6b7280; font-size: 0.85rem; margin-bottom: 20px; }
#chat-box { background: white; border: 1px solid #e5e7eb; border-radius: 12px; padding: 20px; min-height: 300px; max-height: 500px; overflow-y: auto; margin-bottom: 16px; }
.message { margin-bottom: 16px; }
.message.user { text-align: right; }
.bubble { display: inline-block; padding: 10px 16px; border-radius: 18px; max-width: 80%; line-height: 1.5; font-size: 0.95rem; }
.user .bubble { background: #1e40af; color: white; }
.ai .bubble { background: #f3f4f6; color: #111827; text-align: left; }
.label { font-size: 0.75rem; color: #9ca3af; margin-bottom: 4px; }
#input-row { display: flex; gap: 10px; }
#user-input { flex: 1; padding: 12px 16px; border: 1px solid #d1d5db; border-radius: 10px; font-size: 1rem; outline: none; }
#user-input:focus { border-color: #1e40af; }
#send-btn { padding: 12px 24px; background: #1e40af; color: white; border: none; border-radius: 10px; font-size: 1rem; cursor: pointer; font-weight: 600; }
#send-btn:hover { background: #1d4ed8; }
#send-btn:disabled { background: #93c5fd; cursor: not-allowed; }
.thinking { color: #9ca3af; font-style: italic; font-size: 0.9rem; }
</style>
</head>
<body>
<h1>My Offline AI Assistant</h1>
<p class="subtitle">Running locally on your computer — no internet required</p>
<div id="chat-box">
<div class="message ai">
<div class="label">AI</div>
<div class="bubble">Hello! I am your offline AI assistant. I am running entirely on your computer — nothing you type is sent to the internet. How can I help you?</div>
</div>
</div>
<div id="input-row">
<input type="text" id="user-input" placeholder="Type your message..." />
<button id="send-btn" onclick="sendMessage()">Send</button>
</div>
<script>
const OLLAMA_URL = "http://localhost:11434/api/generate";
const MODEL = "llama3.2";
document.getElementById("user-input").addEventListener("keydown", function(e) {
if (e.key === "Enter") sendMessage();
});
async function sendMessage() {
const input = document.getElementById("user-input");
const question = input.value.trim();
if (!question) return;
addMessage("user", question);
input.value = "";
const btn = document.getElementById("send-btn");
btn.disabled = true;
const thinkingId = addMessage("ai", '<span class="thinking">Thinking...</span>');
try {
const response = await fetch(OLLAMA_URL, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ model: MODEL, prompt: question, stream: false })
});
const data = await response.json();
updateMessage(thinkingId, data.response);
} catch (err) {
updateMessage(thinkingId, "Error: Could not reach Ollama. Make sure it is running on your computer.");
}
btn.disabled = false;
input.focus();
}
function addMessage(role, text) {
const box = document.getElementById("chat-box");
const id = "msg-" + Date.now();
box.innerHTML += `
<div class="message ${role}" id="${id}">
<div class="label">${role === "user" ? "You" : "AI"}</div>
<div class="bubble">${text}</div>
</div>`;
box.scrollTop = box.scrollHeight;
return id;
}
function updateMessage(id, text) {
const el = document.getElementById(id);
if (el) el.querySelector(".bubble").textContent = text;
document.getElementById("chat-box").scrollTop = 99999;
}
</script>
</body>
</html>How to Open and Use It
- Make sure Ollama is running (it should be — it starts automatically)
- Double-click the
ai_chat.htmlfile — it opens in your browser - Type a message and press Enter or click Send
You now have a proper-looking chat interface — running 100% offline on your computer.
Why does this work in the browser?
Your browser is making a request to localhost — your own machine. It never goes out to the internet. Ollama listens on port 11434 and responds to these requests. Your HTML file is just a front-end for your local AI server.
Part 3: Make It Your Own — Custom System Prompts
Both applications above give you a general-purpose AI. But the real power comes when you give the AI a specific role and set of instructions before the conversation starts.
This is called a system prompt — instructions you set once that shape how the AI behaves for every message in the session.
Example: IT Helpdesk Assistant
Instead of a general AI, let us build one that acts like a knowledgeable IT support specialist:
In your Python script, update the ask_ai function:
SYSTEM_PROMPT = """You are an IT support specialist for a Windows enterprise environment.
Your users are employees who are not technically skilled.
Keep your answers short, clear, and step-by-step.
When giving instructions, number each step.
Only give solutions that work on Windows 10 or Windows 11.
If you do not know the answer, say so honestly."""
def ask_ai(question):
full_prompt = f"System: {SYSTEM_PROMPT}\n\nUser: {question}\nAssistant:"
payload = {
"model": MODEL_NAME,
"prompt": full_prompt,
"stream": False
}
response = requests.post(OLLAMA_URL, json=payload)
result = response.json()
return result["response"]Now when a user asks "my printer is not working", they get a focused, step-by-step Windows troubleshooting guide — not a generic AI reply.
System Prompt Templates for Common Use Cases
| Use Case | System Prompt Idea |
|---|---|
| IT Helpdesk | "You are an IT support specialist. Give step-by-step Windows solutions. Keep answers brief." |
| Policy Explainer | "You are an HR assistant. Explain company policies in plain language. Never give legal advice." |
| Code Helper | "You are a Python expert. Always include code examples. Explain every line." |
| Document Summariser | "Summarise the provided text in 5 bullet points. Focus on action items and decisions." |
| Training Assistant | "You are an onboarding guide for new employees. Be friendly, encouraging, and thorough." |
Real-World Applications You Can Build
Here is what real organisations are building with offline AI today — and you can build these too with the techniques from this article:
Internal IT Knowledge Base Bot
Load your company's internal documentation (troubleshooting guides, IT policies, network diagrams) into a folder. Build a Python script that reads those files and passes their content as context to the AI. Ask it questions about your own internal systems — privately, with no data leaving your network.
Offline Log Analyser
Pipe Windows Event Viewer logs or firewall logs into your Python script. Use a system prompt that says "analyse this log output and identify errors, warnings, and suspicious activity." Get instant AI-driven log summaries — without sending sensitive server logs to a cloud AI.
PowerShell Script Generator
Set a system prompt: "You are a PowerShell expert. Write scripts for Windows 10/11 and Microsoft 365 administration." Use it as a local coding assistant while writing automation scripts — useful when you cannot use cloud AI tools on work machines due to policy.
Meeting Notes Summariser
Paste a long meeting transcript into your chat and ask for a summary, action items, and decisions. Runs completely offline — ideal for confidential meetings where you cannot paste notes into ChatGPT.
Offline Training Chatbot
Build a self-contained HTML page for new employee onboarding. The AI answers questions about processes and systems. Works on an internal network with no internet access — perfect for secure or air-gapped environments.
Which AI Model Should You Use in Your App?
Different models have different strengths. Here is a practical guide for choosing:
| Model | Best for | RAM Needed | Speed |
|---|---|---|---|
| phi4-mini | Quick answers, low-spec hardware, short tasks | 4 GB | Very fast |
| llama3.2 | General purpose, balanced quality and speed | 8 GB | Fast |
| mistral | Writing, summaries, European language support | 8 GB | Fast |
| deepseek-coder | Writing and debugging code | 8 GB | Fast |
| llama3.1:8b | Better reasoning, longer conversations | 8 GB | Moderate |
| llama3.3:70b | Near-cloud quality reasoning and analysis | 48 GB | Slow |
How to switch models in your app
In both the Python script and the HTML file, you only need to change one line — the MODEL_NAME or MODEL variable. Run ollama list in your terminal to see which models you have already downloaded.
Troubleshooting: Common Problems and Fixes
| Problem | What it means | Fix |
|---|---|---|
Connection refused on port 11434 | Ollama is not running | Open a terminal and run ollama serve |
Model not found error | You typed the model name incorrectly, or have not downloaded it | Run ollama pull llama3.2 to download it |
| Response is very slow | Your computer is using the CPU instead of a GPU | Normal on CPU — try a smaller model like phi4-mini |
pip is not recognised | Python is not installed, or not in your PATH | Re-install Python from python.org and tick "Add to PATH" |
| HTML page shows CORS error | Browser blocking the local request | This can happen in some browsers — try opening the file in Chrome or Edge |
How Far Can You Take This?
What you have built today is the foundation. Here is where developers typically take it next:
Each step is a small, learnable addition. The hardest part — understanding how to talk to a local AI — you have already done.
Frequently Asked Questions
Do I need to be a developer to follow this guide? No. If you can copy and paste code and run a command in a terminal, you can build both applications in this guide. The code is intentionally simple and every line is explained.
Can I use this at work on a corporate laptop? This depends on your organisation's IT policy. Because everything runs locally and nothing touches the internet after the model download, many organisations allow it. Check with your IT department if you are unsure.
What happens if I ask the AI something it gets wrong? Like all AI, local models can make mistakes. Always verify important information. The practical use cases in this guide — summarising your own content, formatting documents, generating code that you review — are low-risk because you are checking the output before using it.
Can I run multiple models at the same time? Ollama loads one model at a time by default. Switching models in your code automatically unloads the previous one. On machines with a lot of RAM, you can configure Ollama to keep multiple models loaded.
Is this different from using the Ollama chat window directly? Yes. The Ollama chat window is a generic interface. When you build your own app, you control the system prompt, the user interface, the data you feed in, and how responses are displayed. That is where the real value comes from — AI that is shaped for your specific job or task.
Can I share my app with colleagues?
Yes — if they have Ollama installed with the same model, they can run your HTML file or Python script directly. For a whole team, you could run Ollama on a shared server inside your network and point everyone's apps at that central address instead of localhost.
Conclusion: You Are Now an Offline AI Builder
What seemed like a developer-only skill — building an AI application — is something you have now done in under an hour using fewer than 30 lines of code.
The key insight to take away: Ollama turns a local AI model into a simple API you can call from any application. Once you understand that, building on top of it is just a matter of asking the right questions and shaping the responses.
Start with the Python script. Get comfortable with it. Add a system prompt for a specific use case you have at work. Then try the HTML version. Once you have both working, you will start seeing opportunities everywhere — logs that need analysing, documents that need summarising, policies that need explaining in plain language.
All of it private. All of it free. All of it running right there on your own computer.
Written by
Chetan Yamger
Cloud Engineer · AI Automation Architect · Modern Workplace Consultant
Cloud Engineer, AI Automation Architect, and Modern Workplace Consultant based in Amsterdam, Netherlands. Specializing in scalable, secure enterprise solutions with Microsoft Azure, Intune, PowerShell, and AI-driven automation using ChatGPT, Gemini, and modern LLM technologies.
Stay in the loop.
New articles, straight to you.
Deep-dive technical articles on Intune, PowerShell, and AI — no noise, no spam.
Discussion
Share your thoughts — your email stays private
Leave a comment
