A harness for coding agents

A from-scratch command-line agent harness over pydantic-ai and OpenAI — a streaming reason–act–observe loop with approval-gated tools, isolated subagents, long-session compression, loadable skills and MCP servers.

role: Solo build
year: 2026
language: Python 3.12 · uv
core: pydantic-ai · OpenAI
interface: CLI · streaming REPL

Flow diagram: a prompt from the CLI enters a per-turn agent that loops reason, act and observe; every tool call crosses a dashed approval gate with path safety to reach the file, web, todo, delegate and MCP tool servers; the answer streams back token by token; delegate spawns an isolated subagent, skills are injected into the prompt, and the conversation is saved to a JSON session that auto-compresses when it grows. — How a turn flows — you type a prompt in the CLI, the agent runs a reason → act → observe loop (capped by an iteration budget, with the model's reasoning preserved), and every tool call crosses an approval gate with path safety before reaching the file, web, todo, delegate and MCP tools. The answer streams back token by token; skills are injected into the prompt, delegate spawns an isolated subagent, and the whole conversation is saved as a resumable JSON session that auto-compresses when it grows.

01OVERVIEW

Every coding agent shares the same skeleton — a loop that reasons, calls a tool, reads the result, and repeats. I wanted to understand that skeleton by building it rather than reaching for a framework, so harness-agent is a command-line coding agent written from scratch on top of pydantic-ai and OpenAI's models. You talk to it in a terminal; it reads and edits files, searches the web, and works through a task step by step.

Each turn runs a reason → act → observe loop, capped by an iteration budget so a confused agent can't spin forever. Output streams back token by token, and Ctrl+C cancels a turn cleanly even mid-tool. The part I cared about most is safety: every tool call passes an approval gate — strict, normal or yolo — that can prompt for a yes / no / always on each call, while a path block-list and a write allow-list mean the agent physically can't read your SSH keys or write outside the folders you sanctioned.

On top of that loop sit the pieces that make it usable on real work: it can hand a sub-task to a fully isolated subagent, summarise its own history when a session grows long so it never overflows the context window, load ‘skills’ (markdown playbooks) on demand, preserve a reasoning model's chain of thought across turns, and attach external MCP servers as extra tools. The whole conversation is saved as a resumable JSON session, and an event bus surfaces every step. It's a personal build, grown across several sprints and covered by unit and integration tests.

02WHAT I BUILT

Streaming reason–act–observe loop

A per-turn loop on pydantic-ai over OpenAI models, capped by an iteration budget so it can't spin forever. Output streams token by token to the terminal, every turn is persisted to disk, and an async hook bus emits each step, tool call and budget event.

Approval gate & path safety

Every tool call passes a strict / normal / yolo gate that can prompt for y (run once), n (deny) or a (always allow this tool). A path block-list and write allow-list are enforced on every file operation, so the agent can't read secrets like SSH keys or write outside sanctioned directories.

Built-in toolset

File operations — read, write, edit, glob, grep and list — plus web fetch (HTML to markdown) and web search across Tavily, Serper or Brave, and a todo tracker. Tools always return safe, size-capped strings; errors surface as markers instead of crashing the turn.

Isolated subagents

delegate_task spawns a fresh child agent with its own session and budget and no copy of the parent's history, so a large job can be farmed out without polluting the main context. A recursion guard stops subagents from spawning their own.

Reasoning passthrough

Reasoning models (o-series and gpt-5) route through OpenAI's Responses API, and their thinking is preserved — saved into the session and replayed on the next turn — so the chain of thought survives across the whole conversation.

Auto-compression

When a session crosses a configurable share of the model's context window, the agent summarises the earlier history in place and keeps only the most recent messages verbatim, so long runs never overflow.

Loadable skills

Drop markdown ‘skills’ into a folder; their summaries are auto-injected into the system prompt, and the model loads a full skill body on demand — or you force one by typing /skill-name in the REPL.

MCP server integration

External Model Context Protocol servers attach as extra toolsets, their tools namespaced per server and routed through the same approval and truncation pipeline, so the one loop can drive third-party and custom tools.

03STACK

Core

pydantic-aiOpenAI Responses APIPython 3.12

Capabilities

ReAct loopapproval gatesubagentsskillsMCP

Tooling

uvpytestruffmypypre-commit