There are two kinds of people building AI assistants right now: those who pipe their data straight into someone else’s cloud, and those who’d rather not. I fall firmly in the second camp.

The result is Jairix. 🦞


What is OpenClaw?

OpenClaw is a framework for building autonomous AI agents. It runs locally, connects to your own models, and supports a multi-agent architecture where specialized agents collaborate on complex tasks. No subscription, no data sharing, no dependency on an external provider.

The core idea is straightforward: configure a set of agents, give each of them tools and a workspace, and let them work together. Orchestration, memory, and context are entirely under your control.


The hardware

Jairix runs on an HP Envy x360 15-ed1555nd — a consumer laptop with 32 GB of RAM, sitting air-gapped on my home network. No server racks, no dedicated GPU machine. Just a laptop on a desk.

For model inference, I run two dedicated Ollama instances elsewhere in my homelab, reachable via internal URLs. The heavy lifting happens there. But the brain — the orchestration, memory, tools, and decisions — all lives on that HP Envy.


The models

This is one of the parts that surprises people. There’s no OpenAI, no Anthropic API. Everything routes through Ollama.

Ollama Pro — $20/month for serious models

Most of the models in my setup carry a :cloud suffix in their IDs. That’s not a local weight — that’s Ollama Pro, a $20/month plan that gives you access to cloud-hosted open models through the exact same Ollama API you’d use locally. No new SDK, no new authentication flow, no vendor lock-in. You point OpenClaw at your Ollama endpoint and it just works, whether the model is running on your own GPU or on Ollama’s infrastructure.

The practical upshot: I get access to 671B-parameter models like DeepSeek V3.1 and 675B Mistral Large 3 without owning the hardware to run them. Ollama hosts them on NVIDIA infrastructure, guarantees no logging and no training on your data, and charges based on actual GPU time rather than a fixed token cap. For the kind of multi-agent workloads Jairix runs — coordinated research pipelines, long document generation, complex planning tasks — this is a significantly better deal than paying per-token to a closed provider.

I maintain two Ollama endpoints with different roles:

General endpoint — used by the main agent for everyday tasks:

ModelParametersNotes
Kimi K2.5Primary default model
Nemotron 3 SuperReasoning, first fallback
GLM-5Fast fallback
MiniMax M2 / M2.5230BHigh-capacity fallback
Qwen3.5397B (A17B active)Reasoning + vision
Qwen3 VL235BVision tasks
Qwen3-Coder-NextCode generation
Nemotron 3 Nano30B1M token context window

Agents endpoint — dedicated to subagent workloads, with heavier models and higher concurrency:

ModelParametersNotes
DeepSeek V3.1671BResearch and reasoning tasks
Kimi K2.5 / ThinkingPlanning, thinking variant available
Mistral Large 3675BWriter, vision fallback
Ministral 314BLightweight agent tasks
Devstral Small 224BCode agent (coder subagent)
GLM-5Writer, assembler, devops primary
Qwen3 VL235BVision agent primary
MiniMax M2 / M2.5230BFallbacks
Nemotron 3 Nano30BExtended context tasks

The default primary model across the system is Kimi K2.5, with a fallback chain of Nemotron 3 Super → GLM-5 → MiniMax M2 → DeepSeek V3.1 → Qwen3.5. Each specialist agent has its own primary and fallbacks tuned to its task — the coder agent leads with Qwen3-Coder-Next and Devstral, the researcher with Qwen3.5 and Kimi Thinking, the vision agent with Qwen3 VL.

For embeddings and semantic memory search, a local Qwen3 Embedding 8B runs on the same machine as Jairix itself — that one actually is local, no cloud involved.


The architecture

Jairix isn’t a single model answering questions. It’s a coordinated system of specialists.

The main agent handles orchestration. It reads context, decides what needs to happen, and delegates to the right specialist. It doesn’t do research itself. It doesn’t write code itself. It coordinates.

The specialist agents each own a specific domain:

AgentPrimary modelResponsibility
plannerKimi K2.5 ThinkingBreaking down large tasks, orchestrating workflows
researcherQwen3.5 397BInformation gathering, fact verification, comparisons
writerGLM-5Turning research into structured documents
assemblerMiniMax M2.5Combining outputs into final deliverables
coderQwen3-Coder-NextScripts, fixes, implementations
visionQwen3.5 397BImage analysis, OCR, diagram interpretation
devopsGLM-5Infrastructure design, CI/CD, cluster configuration
operationsKimi K2.5External actions: email, calendar, APIs (requires explicit approval)

Each agent runs in an isolated Docker sandbox with its own workspace, its own model routing, and strictly scoped tool access. A researcher can search the web and write files. It cannot send email. An operations agent can send email. It cannot spawn other agents.

This isn’t just tidy architecture — it’s a meaningful safety boundary. Destructive or external actions always require a human approval step.


Memory and continuity

One of the harder problems with local agents is continuity across sessions. Models don’t remember anything by default.

OpenClaw solves this through a layered memory system:

  • STATE.md — a live working context file, capped at 3KB, updated frequently. This is the agent’s “what am I doing right now” file. It survives compaction and resets.
  • memory/YYYY-MM-DD.md — daily operational logs with session details, decisions, and discoveries.
  • MEMORY.md — curated long-term memory, loaded only in the main session. Personal facts, stable preferences, ongoing projects.

When context gets long and approaches compaction, the agent runs a memory flush: update STATE.md first, then write session details to the daily log. The result is that even after a full context reset, Jairix wakes up with working context intact.

Semantic memory search runs on the local Qwen3 Embedding model with a hybrid retrieval strategy: 70% vector similarity, 30% text matching, with MMR re-ranking and temporal decay so recent context is weighted higher. All of it local, all of it indexed on-device.


Planning a 45-day road trip through Spain

This is where things get concrete.

Earlier this year I asked Jairix to help plan our 2026 camper trip. The parameters: leave early April, return mid-May, travel through Luxembourg, France, and Spain with the family, mix culture with coast, keep it manageable.

What followed was a full multi-agent research workflow.

The planner broke down the task: route structure first, then per-destination research, then campsite selection, then a combined document. The researcher went to work on each destination — pulling weather data for April, finding campsite options near city centers, checking low-emission zones (critical for a diesel camper in French cities), verifying opening hours for attractions, locating supermarkets within range of each overnight stop.

The writer converted that research into structured destination guides. The assembler combined everything into a final travel plan.

The result is what you can see at blackycamperbus.nl/reisplan: 45 nights, 22 destinations, 5,319 kilometers. Luxemburg → Lyon → Béziers → Barcelona → Tarragona → Peñíscola → Valencia → Alicante → Cartagena → Cabo de Gata → Almería → Granada → Málaga → Córdoba → Sevilla → Mérida → Cáceres → Salamanca → Burgos → Bilbao → San Sebastián → Bordeaux → home.

Every destination includes a researched mini-itinerary, campsite with GPS coordinates, supermarkets within walking or cycling distance, public transport options, weather expectations for the relevant month, and local highlights. The Crit’Air sticker requirement for France was flagged automatically. Low-emission zone warnings are included per city.

That’s not a ChatGPT conversation. That’s a coordinated research pipeline that ran unattended, produced structured outputs, and assembled them into a usable document — all through a single Ollama API, all on open models.


What’s actually impressive about this

Not the models. The models are good but they’re not magic. What’s impressive is the system around them.

The fact that a researcher agent can be told “find campsite options within 5km of the city center for a diesel camper, check for LEZ restrictions, include nearest supermarkets” — and actually do it, write the result to a file, and hand it off to a writer — that’s the value. The orchestration. The memory. The separation of concerns.

It’s also the fact that the entire stack — OpenClaw, Ollama, the models — uses open infrastructure. There’s no API key to a closed provider. No terms of service that changes what you can build. No risk of a pricing change making your setup unaffordable overnight.


The practical reality

I won’t pretend it’s frictionless. Open models are not GPT-4o. Some tasks require nudging. Occasionally a subagent produces output that needs a retry. Context management on long sessions requires attention.

But the architecture is sound, and it keeps improving. The multi-agent pattern genuinely works for complex tasks that benefit from decomposition. The memory system genuinely provides continuity. The sandbox isolation genuinely prevents accidents.

And every improvement I make — every prompt refinement, every tool addition, every workflow pattern — stays in my infrastructure. It doesn’t get absorbed into someone else’s training pipeline.


Try it yourself

If you’re interested in running your own setup:

The entry point is lower than you’d expect. A machine with 16–32 GB of RAM, an Ollama Pro account for the heavier models, and OpenClaw on top is enough to get started. The rest is configuration and iteration.

Jairix leaves for Spain on April 3rd. The plan is ready. 🦞