Pi: The Minimal Coding Agent That Proves Less Is More

There is a quiet revolution happening in AI-assisted coding. While the industry races to build ever-larger, ever-more-complex agent harnesses — stuffing them with thousands of tokens of system prompts, dozens of specialised tools, and opaque internal orchestration — one developer took the opposite path. And it changed everything.

Pi is a coding agent framework created by Mario Zechner, the developer behind the libGDX game framework. It powers OpenClaw, the project that went from zero to 250,000+ GitHub stars in under three months, becoming the most-starred software repository on GitHub — surpassing even React.

This is not a summary. This is a comprehensive, technical guide to every layer of Pi — from philosophy to packet structure, from the ReAct loop to self-extending extensions. If you build with AI agents, this article will change how you think about them.

The Origin: Frustration as a Design Principle

Mario Zechner's frustration was specific and technical. Claude Code — the tool he used daily — had become, in his words, "a spaceship with 80% of functionality I have no use for." The problems were concrete:

System prompt instability: the prompt and tool definitions changed with every release, silently breaking carefully engineered workflows.
Context engineering blindness: the harness injected content into the context window without showing it in the UI, making it impossible to understand what the model actually saw.
Tool proliferation: dozens of overlapping tools competed for the model's attention, consuming context budget on descriptions alone.

His thesis: if you strip a coding agent down to its absolute minimum — a tiny system prompt, four tools, and full context transparency — frontier models will perform better, not worse. The training data already contains everything the model needs to know about being a coding agent.

The results proved him right. Pi with Claude Opus achieved competitive results against Codex, Cursor, and Windsurf on Terminal-Bench 2.0 — with a system prompt under 100 tokens and exactly four tools.

Pi vs The World — Positioning

The positioning is deliberate. Pi occupies a space that no other tool claims: maximum extensibility with minimum core. It is not competing with Claude Code on features — it is arguing that most of those features should not exist in the harness at all.

The chart above tells the story visually. When your system prompt consumes 10,000 tokens, that is 10,000 tokens the model cannot use for your actual code. Pi's approach: give the model the tools, give it your project context (AGENTS.md), and let it work.

The Monorepo: A Layered Architecture

Pi lives in badlogic/pi-mono — a TypeScript monorepo with npm workspaces and strictly enforced layered dependencies. The dependency graph is a DAG: foundation packages have zero internal dependencies, and each layer can only depend downward.

This architecture is not accidental. Each package is independently usable. You can use pi-ai alone for batch LLM processing without any agent infrastructure. You can use pi-agent-core to embed a ReAct loop in your own application without the TUI. The layers are composable, not monolithic.

pi-ai: The Unified LLM API

The foundation of everything. pi-ai maps 15+ LLM providers to a single, unified interface — and it does this by recognising a fundamental truth:

The API Surface

import { complete, completeStreaming } from "@mariozechner/pi-ai";
 
// Simple completion
const response = await complete({
  provider: "anthropic",
  model: "claude-sonnet-4-6",
  messages: [{ role: "user", content: "Explain Pi in one sentence." }],
});
 
// Streaming with abort support
const stream = completeStreaming({
  provider: "openai",

Context Handoff — The Killer Feature

Pi's most unique capability: sessions can span multiple LLM providers in the same conversation. When you switch from Claude to GPT mid-session, the entire context — including thinking traces — is serialised and reformatted for the new provider.

// Start with Claude for complex reasoning
const msg1 = await complete({
  provider: "anthropic",
  model: "claude-opus-4-6",
  messages: conversationHistory,
});
 
// Switch to a cheaper model for routine tasks
const msg2 = await complete({
  provider: "openai",
  model: "gpt-4.1-mini",
  messages: [...conversationHistory, msg1],
  // Thinking traces auto-converted to tagged text
});

This is not just provider switching — it is full state preservation across fundamentally different API formats. No other unified LLM library implements this.

Split Tool Results

Another Pi innovation: tool results can return separate content for the LLM and the UI. The model receives a concise text summary; the TUI receives structured data for rich visualisation:

const tool = {
  name: "search_codebase",
  execute: async (args) => {
    const results = await searchFiles(args.query);
    return {
      llm: `Found ${results.length} matches in ${results.map(r => r.file).join(", ")}`,
      ui: { type:

pi-agent-core: The ReAct Loop in ~200 Lines

Every agentic system — from Claude Code to OpenClaw to Pi — implements the ReAct (Reasoning + Acting) pattern, introduced by Yao et al. at Princeton/Google in 2022. The key insight: LLMs perform significantly better when they can reason, act, observe, and reason again, compared to single-pass responses.

Pi's implementation is minimal and fully inspectable.

The Loop, Simplified

// Pseudocode of Pi's agent loop (actual: ~200 lines of TypeScript)
async function agentLoop(messages, tools, model) {
  while (true) {
    // 1. REASON: Ask the LLM what to do
    const response = await complete({ model, messages, tools });
 
    // 2. Check: Is the model done? (text response, no tool calls)
    if (!response.toolCalls?.length) {
      return response.text; // Final answer
    }
 
    // 3. ACT: Execute each tool call
    const toolResults = await Promise

Tool Validation Pipeline

Every tool call passes through a rigorous validation pipeline. TypeBox schemas provide compile-time type safety AND runtime validation with detailed error messages when tool arguments fail:

import { Type } from "@sinclair/typebox";
 
const ReadTool = {
  name: "read",
  description: "Read file contents",
  schema: Type.Object({
    path: Type.String({ description: "Absolute file path" }),
    offset: Type.Optional(Type.Number({ minimum: 0 })),
    limit: Type.Optional(Type

Message Queuing: Steering vs Follow-Up

While the agent streams, users can inject messages without waiting. Pi supports two fundamentally different injection modes:

Steering interrupts the current generation and redirects the agent. Follow-up waits for the current turn to complete, then the message is processed in the next turn. This distinction matters enormously in practice — it is the difference between "stop, do this instead" and "also, when you're done with that..."

The 4 Core Tools — And Why Only 4

No search_codebase. No get_git_history. No list_directory. The model uses bash for anything that is not file I/O. This works because:

Frontier models already know these tools. They have been RL-trained extensively on coding agent scenarios.
Every additional tool competes for context space. Tool descriptions are expensive — they sit in every request.
Bash is universal. grep -r, find ., git log — the model already knows these commands intimately.

Optional Read-Only Tools

For safe exploration sessions (e.g., onboarding to a new codebase), Pi offers read-only mode:

pi --tools read,grep,find,ls

This disables write, edit, and bash — the agent can explore but cannot modify anything. Ideal for code review or architecture discovery.

Sessions: The JSONL Tree

Pi sessions are stored as append-only JSONL files where each line is a node with an id and parentId. This creates a tree structure that enables branching: you can return to any previous point and continue, creating a new branch while preserving all previous branches in the same file.

{"id":"a1","parentId":null,"role":"user","content":"Fix the auth bug"}
{"id":"a2","parentId":"a1","role":"assistant","content":"Let me read..."}
{"id":"a3","parentId

Node b1 branches from a2, creating an alternate timeline. Both paths coexist in the same file. The SessionManager API provides operations for listing, creating, resuming, and branching sessions.

Compaction: Infinite Session Length

When a session approaches 80% of the context window (configurable), Pi triggers compaction. Older messages are replaced by a summary generated by a cheap model (e.g., Claude Haiku). This enables unlimited session length without quality degradation:

The compaction threshold is configurable — and the summary model can be different from the primary model. In practice, using Haiku for compaction summaries while running Opus for reasoning provides excellent quality at minimal cost.

The 4 Operating Modes

All four modes share a single abstraction: AgentSession. The modes differ only in their I/O layer.

SDK Mode: How OpenClaw Uses Pi

OpenClaw — the 250K+ star personal AI assistant — uses Pi's SDK mode. It does not spawn Pi as a subprocess. It imports and instantiates Pi directly, giving full lifecycle control:

import { createAgentSession } from "@mariozechner/pi-agent-core";
 
const session = createAgentSession({
  provider: "anthropic",
  model: "claude-sonnet-4-6",
  tools: [...defaultTools, ...customTools],
  extensions: [securityAudit, messageFormatter],
  onMessage: (msg) => gateway.send(channel, msg),
});
 
// Handle incoming message from WhatsApp/Telegram/Slack

This is the pattern that made OpenClaw possible. Peter Steinberger's "weekend hack" plugged Pi's agent core into a multi-platform messaging gateway — and the result went viral because the underlying agent engine (Pi) was solid enough to handle real-world, multi-channel workloads.

The Extension System

Extensions are TypeScript modules loaded by jiti — a runtime TypeScript/ESM transpiler that works without a build step. This enables hot-reload: edit the .ts file, type /reload, and changes are live immediately.

Extension Structure

// my-extension.ts
import { ExtensionAPI } from "@mariozechner/pi-coding-agent";
 
export default function (api: ExtensionAPI) {
  // Add a custom slash command
  api.addSlashCommand({
    name: "review",
    description: "Review staged changes",
    execute: async () => {
      const diff = await api.bash("git diff --staged");
      return api.sendMessage(

The Self-Extension Paradigm

This is Pi's most radical capability. Because extensions are TypeScript loaded without a build step, the agent can write and hot-reload its own extensions. The workflow:

You describe what you want: "I need a /deploy command that runs our staging pipeline"
Pi writes the extension as a .ts file
You type /reload
Pi tests the new command
If it fails, Pi reads the error, fixes the code, reloads, and retries

Armin Ronacher (creator of Flask and Jinja2) described this as "software that is malleable like clay." All of his Pi extensions — /answer, /todos, /review, /files — were written by Pi itself, not by Armin. He described the requirements, pointed the agent to examples, and Pi implemented them autonomously.

The System Prompt: ~100 Tokens

This is the actual Pi system prompt, reproduced from source. Compare this with Claude Code's ~10,000 tokens:

You are an AI coding assistant. You help users with software engineering tasks.
Use the provided tools to accomplish tasks.

That is it. The rest of the context budget is allocated to:

The hierarchy loads in this order:

System prompt (~100 tokens)
Tool definitions (~900 tokens)
AGENTS.md files (project-specific context, user-controlled)
Skills (progressive disclosure — loaded on demand)
Conversation history (the bulk of the context)

The user controls the majority of the context budget through AGENTS.md. This is the opposite of Claude Code's approach, where the harness controls most of the context and the user gets whatever is left.

Skills: The agentskills.io Standard

Pi implements agentskills.io — an open standard for portable skill packages. Skills written for Pi work identically in Claude Code, Codex CLI, Amp, and Droid.

SKILL.md Format

---
name: code-review
version: 1.0.0
description: Automated code review with style checks
triggers:
  - /review
  - "review this"
---
 
# Code Review Skill
 
You are a code review assistant. When activated:
 
1. Read the staged git diff
2. Check for:
   - Security vulnerabilities
   - Performance issues
   - Style violations
   - Missing error handling
3. Provide actionable feedback with line references

Skills use progressive disclosure: their full content is loaded into context only when triggered, not at session start. This keeps the baseline context lean while making deep functionality available on demand.

Official Skill Repositories

Agentic Patterns: 5 Ways to Use Pi

Pattern 1: ReAct (Default)

The default behaviour. No configuration needed — the model decides when to call tools based on context:

pi "Fix the failing tests in src/auth/"

Pattern 2: Plan-and-Execute

For well-defined multi-step tasks, generate a plan first, then execute. Research shows 92% completion rate and 3.6x speedup vs sequential ReAct:

pi "Use /plan to refactor the payment module to use Stripe v3"

Pattern 3: Multi-Agent (Orchestrator + Specialists)

Using pi --print subprocesses or the nicobailon/pi-subagents package for parallel execution:

# agents/security-reviewer.md
---
name: security-reviewer
model: claude-sonnet-4-6
tools: [read, grep, find]
---
You are a security specialist. Review code for OWASP Top 10 vulnerabilities.

Pattern 4: Reflection

Have a specialist agent review the primary agent's output before presenting it:

pi "Implement the feature, then use /reflect to review your own work"

Pattern 5: Dynamic Tool Loading

When you have 50+ tools, load them progressively to avoid accuracy degradation:

api.on("resources_discover", async () => {
  // Only load tools relevant to current context
  const packageJson = await readFile("package.json");
  if (packageJson.includes("prisma")) {
    api.addTool(prismaQueryTool);
  }
});

The OpenClaw Story: From Weekend Hack to 250K Stars

OpenClaw began as a "weekend hack" by Austrian developer Peter Steinberger. The concept: a personal AI assistant that runs on your own devices and answers on the channels you already use — WhatsApp, Telegram, Slack, Discord, Google Chat, Signal, iMessage, IRC, Microsoft Teams, Matrix, LINE, and more.

The architecture is straightforward: OpenClaw is a messaging gateway that uses Pi's createAgentSession() SDK to embed a full coding agent. Each channel gets its own isolated session. Docker sandboxing ensures agent code runs safely. Scheduled events enable cron-like triggers.

Why Pi Made OpenClaw Possible

SDK mode: Pi embeds as a library, not a subprocess — full lifecycle control
Provider agnosticism: Users choose their LLM (Claude, GPT, Gemini, local models via Ollama)
Extension system: OpenClaw adds custom extensions for platform-specific features
Session persistence: JSONL tree format means conversations survive restarts
Minimal footprint: Pi's tiny core means fast startup and low memory usage

The Ollama Integration

In March 2026, Ollama officially added ollama launch pi to its CLI — making Pi the first agentic framework accessible with a single command. Combined with Kimi K2.5 (a 1-trillion-parameter MoE model), this means:

ollama launch pi --model kimi-k2.5:cloud

A full coding agent, powered by a trillion-parameter model, at 9x lower cost than Claude Opus. $0.60 per million input tokens. $3.00 per million output tokens.

The TUI: Retained Mode vs Immediate Mode

Pi's terminal UI (pi-tui) makes a fundamental architectural choice that differs from every other coding agent:

The retained mode approach means Pi's TUI never flickers. Components persist across frames, cache their rendered output, and only re-render what changed. This is the same differential rendering model used by React — but for the terminal.

Extensions have full access to the component system. You can build interactive UIs — progress bars, tables, selection menus, syntax-highlighted code blocks — that integrate seamlessly with the agent output.

The 7 Failure Modes of Agents (and Pi's Solutions)

Context Economics: Why Every Token Matters

The chart illustrates why Pi's minimal approach wins economically. When your harness consumes 15,000 tokens before the user's first message, you are paying for that overhead on every single API call. Over a long session with hundreds of turns, the cumulative cost of a bloated system prompt is enormous.

Pi's approach: ~1,000 tokens for system prompt + tools. The remaining budget is yours.

How to Get Started

Step 1: Install

npm install -g @mariozechner/pi-coding-agent
# or
ollama launch pi

Step 2: Configure a Provider

# Anthropic (recommended)
export ANTHROPIC_API_KEY=sk-ant-...
 
# OpenAI
export OPENAI_API_KEY=sk-...
 
# Local models via Ollama
# No API key needed — Pi auto-detects local Ollama

Step 3: Create AGENTS.md

# My Project
 
## Tech Stack
- Next.js 15, TypeScript, Tailwind CSS
- PostgreSQL with Drizzle ORM
- Deployed on Vercel
 
## Conventions
- Use conventional commits
- Tests required for all new features
- No direct database queries — use the ORM layer

Step 4: Run

# Interactive mode (default)
pi
 
# With a specific model
pi --model claude-opus-4-6
 
# Print mode for scripting
pi --print "Generate a migration for adding user roles"
 
# Read-only exploration
pi --tools read,grep,find,ls "Explain the auth flow"

Step 5: Build Your First Extension

// .pi/extensions/hello.ts
export default function (api) {
  api.addSlashCommand({
    name: "hello",
    description: "A warm greeting",
    execute: async () => "Hello from my first Pi extension!",
  });
}

Type /reload in Pi, then /hello. Your extension is live.

Why Open Source, Stripped to Basics, Wins

The lesson of Pi is not that minimalism is always better. It is that minimalism in the right places — the system prompt, the tool set, the core loop — creates space for maximalism where it matters: in the user's control over context, extensions, and workflow.

Claude Code is an excellent product. Cursor is an excellent product. But they are products — designed for the average user, optimised for the common case. Pi is a toolkit — designed for developers who want to understand and control every token that enters their agent's context window.

The evidence speaks for itself:

31,600+ GitHub stars on the core repo
250,000+ stars on OpenClaw (built on Pi's SDK)
184 releases in active development
Ollama integration making it a single-command install
agentskills.io establishing a cross-agent skill standard
Competitive benchmark results with a fraction of the complexity

The next time you fight with a coding agent that is doing something unexpected — injecting content you cannot see, calling tools you did not ask for, consuming context on features you do not use — remember that there is an alternative. One that trusts the model, trusts the developer, and gets out of the way.

That alternative is Pi. And it is proving that in the age of AI agents, the best framework is the one that barely exists.

Pi is open source under the MIT license. Repository: badlogic/pi-mono. Official site: pi.dev.

OpenClaw: openclaw/openclaw. Armin Ronacher's analysis: Pi: The Minimal Agent Within OpenClaw.