A practical field guide

Run AI agents and coding tools on your own machine.

Hermes Agent, OpenClaw, OpenCode, and Claude Code all happily talk to a local Ollama model. No API keys, no per-token billing, no data leaving the box. This site is the working notes I wish I had on day one.

The stack

Four tools, one local model

Agent

Hermes Agent

Nous Research's agent runtime. Multi-channel gateway (Telegram, Slack, Discord, iMessage), persistent memory, skills system, cron jobs. ~14k GitHub stars, very active.

Setup walkthrough →

Agent

OpenClaw

Peter Steinberger's open-source personal assistant. Runs locally, talks to 35+ LLM providers including Ollama, plugs into the same chat apps. The "AI that actually does things."

Setup walkthrough →

Model

Ollama

The local LLM runtime. One ollama pull and you have a chat-capable model on your Mac. Hermes and OpenClaw both speak Ollama's OpenAI-compatible API natively.

Model picks + config →

The coding tier

Pair the agent with a coding tool

Hermes and OpenClaw are great for chat, scheduling, and ad-hoc tasks. For actual code work — multi-file refactors, PR review, test generation — hand off to a coding tool that speaks Ollama.

OpenCode

Terminal-first coding agent, open source, model-agnostic. Point it at your local Ollama endpoint and it acts like a junior pair programmer that lives in your shell.

Wire it up →

Claude Code

Anthropic's coding CLI. Designed for Claude, but with the right provider config it will run against any OpenAI-compatible endpoint — including your local Ollama server.

Wire it up →

Why bother

The case for local-first

  • No API bill. Run a 14B model all day on an M-series Mac for the cost of electricity.
  • No data leak. Customer code, internal docs, sketchy URLs — all stay on disk.
  • No rate limits. Cron job every five minutes? Go for it.
  • Reproducible. Pin a model version with ollama pull model:tag.
  • Offline-capable. Plane Wi-Fi doesn't matter when the model is local.
  • Swap models freely. Try a 7B for speed, 32B for quality — same API.
  • Educational. You actually see what tokens are. No black box.
  • Compounds with cloud. Use local by default, cloud when the task demands it.
Honest caveat: a 14B local model is not Claude Opus 4. For complex refactors, novel architecture, or anything that needs to reason across 50 files, you're still going to want a frontier API. The win is having the default be cheap and local — escalate when it matters.

What you'll need

Prerequisites

  1. A Mac (M-series recommended) or Linux box

    16GB unified memory is the practical floor for 14B. 32B+ needs 32GB+ or you'll be paging.

  2. Node.js 20+ and/or Python 3.10+

    Hermes and OpenClaw are Node. OpenCode is Go binary. Claude Code is Node. Ollama is a single binary.

  3. Ollama installed

    brew install ollama on macOS, then ollama serve in one terminal. The default endpoint is http://127.0.0.1:11434/v1.

  4. A model pulled

    ollama pull qwen2.5-coder:14b for coding, ollama pull llama3.1:8b for chat. See the Ollama page for full picks.