Cloud Engineer Lab
Cloud Engineer Lab
Cloud Engineer Lab
Cloud Engineer Lab
© 2026
How to Build Your Own Offline AI Application: A Complete Beginner's Guide
AI & InnovationIntermediate

How to Build Your Own Offline AI Application: A Complete Beginner's Guide

Learn how to build a real AI-powered application that runs 100% on your own computer — no internet, no API keys, no subscription. Step-by-step Python and browser examples using Ollama, written for IT admins and non-developers alike.

16 min read
Share

You have already heard about offline AI — running AI models on your own computer without sending data to the cloud. But there is a big difference between using a local AI through a chat window and actually building your own application on top of it.

This guide bridges that gap.

By the end of this article, you will have built two real, working offline AI applications from scratch:

  1. A Python command-line assistant — ask it anything, get an AI reply, all local
  2. A browser-based chat interface — a simple web page that looks like a proper AI chat app

No prior programming experience required. Every piece of code is explained line by line in plain English. If you can copy and paste, you can build these.


The Big Picture: How an Offline AI App Actually Works

Before writing a single line of code, let us understand what is actually happening.

Think of it like a restaurant kitchen:

  • The AI model is the chef — it knows how to cook (generate answers)
  • Ollama is the restaurant manager — it organises everything so the chef is ready to take orders
  • Your application is the waiter — it takes your order (your question) and brings the food back (the AI's answer)

When you install Ollama on your computer, it runs quietly in the background as a local server — similar to a tiny website running on your own machine. It listens at the address http://localhost:11434 and waits for requests.

Your application sends a message to that address, Ollama passes it to the AI model, the model generates a reply, and the reply comes back to your app.

text
Your App → http://localhost:11434 → Ollama → AI Model → Answer → Your App

That is the entire architecture. Everything stays on your machine. Nothing touches the internet after the one-time model download.


What You Need Before You Start

You need three things:

WhatWhyWhere to get it
OllamaRuns the AI model locallyollama.com
Python 3.10+To write and run your apppython.org
A text editorTo write your codeVS Code (recommended), Notepad++, or even Notepad

That is it. No cloud accounts. No API keys. No credit card.

Check if Python is already installed

Open a command prompt (Windows: press Win + R, type cmd, press Enter) and type python --version. If you see a version number like Python 3.11.2, you already have Python and can skip the Python install step.

Install and Start Ollama

Download Ollama

Go to ollama.com and click Download. Choose Windows, macOS, or Linux. Install it like any normal application.

Download an AI Model

Open a command prompt and run this command. It downloads the Llama 3.2 model (about 2 GB) — a capable, fast model that runs well on most computers.

bash
ollama pull llama3.2

Wait for the download to finish. You only do this once.

Verify Ollama is Running

Open your browser and go to http://localhost:11434. If you see the text Ollama is running, you are ready. Ollama starts automatically in the background when you install it.


Part 1: Your First Offline AI App — Python Command Line

We will build a simple Python script that lets you type a question and get an AI reply. Twenty lines of code. Let us go through it piece by piece so you understand every line.

The Complete Script

Create a new file called ai_chat.py and paste this code:

python
import requests
import json
 
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL_NAME = "llama3.2"
 
def ask_ai(question):
    payload = {
        "model": MODEL_NAME,
        "prompt": question,
        "stream": False
    }
    response = requests.post(OLLAMA_URL, json=payload)
    result = response.json()
    return result["response"]
 
print("Offline AI Assistant — type 'quit' to exit\n")
 
while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        break
    answer = ask_ai(user_input)
    print(f"\nAI: {answer}\n")

Line-by-Line Explanation

Lines 1–2 — Import libraries

python
import requests
import json

requests is a Python library that lets your script send messages over the internet (or in this case, to Ollama running locally). json handles converting data to the format Ollama expects.

Lines 4–5 — Set the address and model

python
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL_NAME = "llama3.2"

This tells Python where to send the question (the Ollama address) and which AI model to use. If you later download a different model, you only need to change the model name here.

Lines 7–13 — The ask_ai function

python
def ask_ai(question):
    payload = {
        "model": MODEL_NAME,
        "prompt": question,
        "stream": False
    }
    response = requests.post(OLLAMA_URL, json=payload)
    result = response.json()
    return result["response"]

This is the main engine. It bundles your question into a payload (a small package of information), sends it to Ollama, waits for the reply, and returns the AI's answer text. stream: False means we wait for the full reply before displaying it — simpler for beginners.

Lines 15–21 — The chat loop

python
print("Offline AI Assistant — type 'quit' to exit\n")
 
while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        break
    answer = ask_ai(user_input)
    print(f"\nAI: {answer}\n")

This runs the chat. while True keeps the program running until you type quit. It takes your input, passes it to the ask_ai function, and prints the reply.

Install the Requests Library

Before running, you need to install the requests library. Open your command prompt and run:

bash
pip install requests

Run Your App

In your command prompt, navigate to the folder where you saved ai_chat.py and run:

bash
python ai_chat.py

You will see:

text
Offline AI Assistant — type 'quit' to exit
 
You: What is the capital of France?
 
AI: The capital of France is Paris. It is the largest city in the country
and serves as the political, cultural, and commercial centre of France...
 
You:

That is your first offline AI application running on your own computer. No internet after the model download. No data leaving your machine.

Change the model in one line

Try replacing llama3.2 with mistral or phi4-mini (after running ollama pull mistral) and run the script again. Different models have different personalities and strengths.


Part 2: A Browser-Based Offline AI Chat Interface

Now let us build something that looks like a real chat application — a webpage with a text box, a Send button, and a conversation history. No frameworks, no Node.js, no build tools. Just one HTML file.

The Complete HTML File

Create a new file called ai_chat.html and paste this:

html
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>My Offline AI Chat</title>
  <style>
    body { font-family: Arial, sans-serif; max-width: 700px; margin: 40px auto; padding: 0 20px; background: #f5f5f5; }
    h1 { color: #1e40af; font-size: 1.4rem; margin-bottom: 4px; }
    p.subtitle { color: #6b7280; font-size: 0.85rem; margin-bottom: 20px; }
    #chat-box { background: white; border: 1px solid #e5e7eb; border-radius: 12px; padding: 20px; min-height: 300px; max-height: 500px; overflow-y: auto; margin-bottom: 16px; }
    .message { margin-bottom: 16px; }
    .message.user { text-align: right; }
    .bubble { display: inline-block; padding: 10px 16px; border-radius: 18px; max-width: 80%; line-height: 1.5; font-size: 0.95rem; }
    .user .bubble { background: #1e40af; color: white; }
    .ai .bubble { background: #f3f4f6; color: #111827; text-align: left; }
    .label { font-size: 0.75rem; color: #9ca3af; margin-bottom: 4px; }
    #input-row { display: flex; gap: 10px; }
    #user-input { flex: 1; padding: 12px 16px; border: 1px solid #d1d5db; border-radius: 10px; font-size: 1rem; outline: none; }
    #user-input:focus { border-color: #1e40af; }
    #send-btn { padding: 12px 24px; background: #1e40af; color: white; border: none; border-radius: 10px; font-size: 1rem; cursor: pointer; font-weight: 600; }
    #send-btn:hover { background: #1d4ed8; }
    #send-btn:disabled { background: #93c5fd; cursor: not-allowed; }
    .thinking { color: #9ca3af; font-style: italic; font-size: 0.9rem; }
  </style>
</head>
<body>
  <h1>My Offline AI Assistant</h1>
  <p class="subtitle">Running locally on your computer — no internet required</p>
 
  <div id="chat-box">
    <div class="message ai">
      <div class="label">AI</div>
      <div class="bubble">Hello! I am your offline AI assistant. I am running entirely on your computer — nothing you type is sent to the internet. How can I help you?</div>
    </div>
  </div>
 
  <div id="input-row">
    <input type="text" id="user-input" placeholder="Type your message..." />
    <button id="send-btn" onclick="sendMessage()">Send</button>
  </div>
 
  <script>
    const OLLAMA_URL = "http://localhost:11434/api/generate";
    const MODEL = "llama3.2";
 
    document.getElementById("user-input").addEventListener("keydown", function(e) {
      if (e.key === "Enter") sendMessage();
    });
 
    async function sendMessage() {
      const input = document.getElementById("user-input");
      const question = input.value.trim();
      if (!question) return;
 
      addMessage("user", question);
      input.value = "";
 
      const btn = document.getElementById("send-btn");
      btn.disabled = true;
      const thinkingId = addMessage("ai", '<span class="thinking">Thinking...</span>');
 
      try {
        const response = await fetch(OLLAMA_URL, {
          method: "POST",
          headers: { "Content-Type": "application/json" },
          body: JSON.stringify({ model: MODEL, prompt: question, stream: false })
        });
        const data = await response.json();
        updateMessage(thinkingId, data.response);
      } catch (err) {
        updateMessage(thinkingId, "Error: Could not reach Ollama. Make sure it is running on your computer.");
      }
 
      btn.disabled = false;
      input.focus();
    }
 
    function addMessage(role, text) {
      const box = document.getElementById("chat-box");
      const id = "msg-" + Date.now();
      box.innerHTML += `
        <div class="message ${role}" id="${id}">
          <div class="label">${role === "user" ? "You" : "AI"}</div>
          <div class="bubble">${text}</div>
        </div>`;
      box.scrollTop = box.scrollHeight;
      return id;
    }
 
    function updateMessage(id, text) {
      const el = document.getElementById(id);
      if (el) el.querySelector(".bubble").textContent = text;
      document.getElementById("chat-box").scrollTop = 99999;
    }
  </script>
</body>
</html>

How to Open and Use It

  1. Make sure Ollama is running (it should be — it starts automatically)
  2. Double-click the ai_chat.html file — it opens in your browser
  3. Type a message and press Enter or click Send

You now have a proper-looking chat interface — running 100% offline on your computer.

Why does this work in the browser?

Your browser is making a request to localhost — your own machine. It never goes out to the internet. Ollama listens on port 11434 and responds to these requests. Your HTML file is just a front-end for your local AI server.


Part 3: Make It Your Own — Custom System Prompts

Both applications above give you a general-purpose AI. But the real power comes when you give the AI a specific role and set of instructions before the conversation starts.

This is called a system prompt — instructions you set once that shape how the AI behaves for every message in the session.

Example: IT Helpdesk Assistant

Instead of a general AI, let us build one that acts like a knowledgeable IT support specialist:

In your Python script, update the ask_ai function:

python
SYSTEM_PROMPT = """You are an IT support specialist for a Windows enterprise environment.
Your users are employees who are not technically skilled.
Keep your answers short, clear, and step-by-step.
When giving instructions, number each step.
Only give solutions that work on Windows 10 or Windows 11.
If you do not know the answer, say so honestly."""
 
def ask_ai(question):
    full_prompt = f"System: {SYSTEM_PROMPT}\n\nUser: {question}\nAssistant:"
    payload = {
        "model": MODEL_NAME,
        "prompt": full_prompt,
        "stream": False
    }
    response = requests.post(OLLAMA_URL, json=payload)
    result = response.json()
    return result["response"]

Now when a user asks "my printer is not working", they get a focused, step-by-step Windows troubleshooting guide — not a generic AI reply.

System Prompt Templates for Common Use Cases

Use CaseSystem Prompt Idea
IT Helpdesk"You are an IT support specialist. Give step-by-step Windows solutions. Keep answers brief."
Policy Explainer"You are an HR assistant. Explain company policies in plain language. Never give legal advice."
Code Helper"You are a Python expert. Always include code examples. Explain every line."
Document Summariser"Summarise the provided text in 5 bullet points. Focus on action items and decisions."
Training Assistant"You are an onboarding guide for new employees. Be friendly, encouraging, and thorough."

Real-World Applications You Can Build

Here is what real organisations are building with offline AI today — and you can build these too with the techniques from this article:

Internal IT Knowledge Base Bot

Load your company's internal documentation (troubleshooting guides, IT policies, network diagrams) into a folder. Build a Python script that reads those files and passes their content as context to the AI. Ask it questions about your own internal systems — privately, with no data leaving your network.

Offline Log Analyser

Pipe Windows Event Viewer logs or firewall logs into your Python script. Use a system prompt that says "analyse this log output and identify errors, warnings, and suspicious activity." Get instant AI-driven log summaries — without sending sensitive server logs to a cloud AI.

PowerShell Script Generator

Set a system prompt: "You are a PowerShell expert. Write scripts for Windows 10/11 and Microsoft 365 administration." Use it as a local coding assistant while writing automation scripts — useful when you cannot use cloud AI tools on work machines due to policy.

Meeting Notes Summariser

Paste a long meeting transcript into your chat and ask for a summary, action items, and decisions. Runs completely offline — ideal for confidential meetings where you cannot paste notes into ChatGPT.

Offline Training Chatbot

Build a self-contained HTML page for new employee onboarding. The AI answers questions about processes and systems. Works on an internal network with no internet access — perfect for secure or air-gapped environments.


Which AI Model Should You Use in Your App?

Different models have different strengths. Here is a practical guide for choosing:

ModelBest forRAM NeededSpeed
phi4-miniQuick answers, low-spec hardware, short tasks4 GBVery fast
llama3.2General purpose, balanced quality and speed8 GBFast
mistralWriting, summaries, European language support8 GBFast
deepseek-coderWriting and debugging code8 GBFast
llama3.1:8bBetter reasoning, longer conversations8 GBModerate
llama3.3:70bNear-cloud quality reasoning and analysis48 GBSlow

How to switch models in your app

In both the Python script and the HTML file, you only need to change one line — the MODEL_NAME or MODEL variable. Run ollama list in your terminal to see which models you have already downloaded.


Troubleshooting: Common Problems and Fixes

ProblemWhat it meansFix
Connection refused on port 11434Ollama is not runningOpen a terminal and run ollama serve
Model not found errorYou typed the model name incorrectly, or have not downloaded itRun ollama pull llama3.2 to download it
Response is very slowYour computer is using the CPU instead of a GPUNormal on CPU — try a smaller model like phi4-mini
pip is not recognisedPython is not installed, or not in your PATHRe-install Python from python.org and tick "Add to PATH"
HTML page shows CORS errorBrowser blocking the local requestThis can happen in some browsers — try opening the file in Chrome or Edge

How Far Can You Take This?

What you have built today is the foundation. Here is where developers typically take it next:

Basic app working (Python CLI or HTML page)
Add a system prompt to give the AI a specific role
Feed your own documents or data as context to the AI
Add conversation memory so the AI remembers earlier messages
Package it as a desktop app or internal web tool for your team

Each step is a small, learnable addition. The hardest part — understanding how to talk to a local AI — you have already done.


Frequently Asked Questions

Do I need to be a developer to follow this guide? No. If you can copy and paste code and run a command in a terminal, you can build both applications in this guide. The code is intentionally simple and every line is explained.

Can I use this at work on a corporate laptop? This depends on your organisation's IT policy. Because everything runs locally and nothing touches the internet after the model download, many organisations allow it. Check with your IT department if you are unsure.

What happens if I ask the AI something it gets wrong? Like all AI, local models can make mistakes. Always verify important information. The practical use cases in this guide — summarising your own content, formatting documents, generating code that you review — are low-risk because you are checking the output before using it.

Can I run multiple models at the same time? Ollama loads one model at a time by default. Switching models in your code automatically unloads the previous one. On machines with a lot of RAM, you can configure Ollama to keep multiple models loaded.

Is this different from using the Ollama chat window directly? Yes. The Ollama chat window is a generic interface. When you build your own app, you control the system prompt, the user interface, the data you feed in, and how responses are displayed. That is where the real value comes from — AI that is shaped for your specific job or task.

Can I share my app with colleagues? Yes — if they have Ollama installed with the same model, they can run your HTML file or Python script directly. For a whole team, you could run Ollama on a shared server inside your network and point everyone's apps at that central address instead of localhost.


Conclusion: You Are Now an Offline AI Builder

What seemed like a developer-only skill — building an AI application — is something you have now done in under an hour using fewer than 30 lines of code.

The key insight to take away: Ollama turns a local AI model into a simple API you can call from any application. Once you understand that, building on top of it is just a matter of asking the right questions and shaping the responses.

Start with the Python script. Get comfortable with it. Add a system prompt for a specific use case you have at work. Then try the HTML version. Once you have both working, you will start seeing opportunities everywhere — logs that need analysing, documents that need summarising, policies that need explaining in plain language.

All of it private. All of it free. All of it running right there on your own computer.

CChetan Yamger

Written by

Chetan Yamger

Cloud Engineer · AI Automation Architect · Modern Workplace Consultant

Cloud Engineer, AI Automation Architect, and Modern Workplace Consultant based in Amsterdam, Netherlands. Specializing in scalable, secure enterprise solutions with Microsoft Azure, Intune, PowerShell, and AI-driven automation using ChatGPT, Gemini, and modern LLM technologies.

Cloud & Modern WorkplaceMicrosoft Intune & MDMAzure & Microsoft 365AI AutomationPrompt EngineeringPowerShell & Graph APIWindows AutopilotConditional Access & Zero TrustSCCM / MECM & MSIXVDI / WVDPower BINode.js & Next.js
Newsletter

Stay in the loop.
New articles, straight to you.

Deep-dive technical articles on Intune, PowerShell, and AI — no noise, no spam.

New article notifications
No spam, ever
Free forever

Discussion

Share your thoughts — your email stays private

Leave a comment

0/2000

Your email is used to prevent spam and will never be displayed.