A practical field guide

Run AI agents and coding tools on your own machine.

Hermes Agent, OpenClaw, OpenCode, and Claude Code all happily talk to a local Ollama model. No API keys, no per-token billing, no data leaving the box. This site is the working notes I wish I had on day one.

Get started → GitHub

The stack

Four tools, one local model

Agent

Hermes Agent

Nous Research's agent runtime. Multi-channel gateway (Telegram, Slack, Discord, iMessage), persistent memory, skills system, cron jobs. ~14k GitHub stars, very active.

Setup walkthrough →

Agent

OpenClaw

Peter Steinberger's open-source personal assistant. Runs locally, talks to 35+ LLM providers including Ollama, plugs into the same chat apps. The "AI that actually does things."

Setup walkthrough →

Model

Ollama

The local LLM runtime. One ollama pull and you have a chat-capable model on your Mac. Hermes and OpenClaw both speak Ollama's OpenAI-compatible API natively.

Model picks + config →

The coding tier

Pair the agent with a coding tool

Hermes and OpenClaw are great for chat, scheduling, and ad-hoc tasks. For actual code work — multi-file refactors, PR review, test generation — hand off to a coding tool that speaks Ollama.

OpenCode

Terminal-first coding agent, open source, model-agnostic. Point it at your local Ollama endpoint and it acts like a junior pair programmer that lives in your shell.

Wire it up →

Claude Code

Anthropic's coding CLI. Designed for Claude, but with the right provider config it will run against any OpenAI-compatible endpoint — including your local Ollama server.

Wire it up →

Why bother

The case for local-first

No API bill. Run a 14B model all day on an M-series Mac for the cost of electricity.
No data leak. Customer code, internal docs, sketchy URLs — all stay on disk.
No rate limits. Cron job every five minutes? Go for it.
Reproducible. Pin a model version with ollama pull model:tag.

Offline-capable. Plane Wi-Fi doesn't matter when the model is local.
Swap models freely. Try a 7B for speed, 32B for quality — same API.
Educational. You actually see what tokens are. No black box.
Compounds with cloud. Use local by default, cloud when the task demands it.

Honest caveat: a 14B local model is not Claude Opus 4. For complex refactors, novel architecture, or anything that needs to reason across 50 files, you're still going to want a frontier API. The win is having the default be cheap and local — escalate when it matters.

What you'll need

Prerequisites

A Mac (M-series recommended) or Linux box

16GB unified memory is the practical floor for 14B. 32B+ needs 32GB+ or you'll be paging.
Node.js 20+ and/or Python 3.10+

Hermes and OpenClaw are Node. OpenCode is Go binary. Claude Code is Node. Ollama is a single binary.
Ollama installed

brew install ollama on macOS, then ollama serve in one terminal. The default endpoint is http://127.0.0.1:11434/v1.
A model pulled

ollama pull qwen2.5-coder:14b for coding, ollama pull llama3.1:8b for chat. See the Ollama page for full picks.

Run AI agents and coding tools on your own machine.

Four tools, one local model

Hermes Agent

OpenClaw

Ollama

Pair the agent with a coding tool

OpenCode

Claude Code

The case for local-first

Prerequisites

A Mac (M-series recommended) or Linux box

Node.js 20+ and/or Python 3.10+

Ollama installed

A model pulled