Building a Self-Hosted AI Agent Swarm on Kubernetes with kagent

I’ve been running a homelab Kubernetes cluster on Talos Linux for a while now, and lately I’ve been diving deep into self-hosted AI. I already had Ollama running, OpenWebUI as my chat interface, and a bunch of local models. But I wanted something more — a proper multi-agent setup where specialized agents handle different tasks automatically, without me manually routing every request.

This is the story of how I built that, what broke along the way, and what actually works.

The Goal

The idea was simple: a swarm of specialized AI agents, each good at one thing, orchestrated by a single entry point. I chat with one interface, and the right agent does the work.

A DevOps agent that knows my Kubernetes cluster
A Product Owner agent that creates Azure DevOps User Stories following our team’s working agreement
An orchestrator that figures out which agent to call

No manual model switching. No copy-pasting context. Just ask, and it happens.

Why kagent?

After evaluating LangGraph, CrewAI, AutoGen, and a few others, I landed on kagent. It’s a CNCF Sandbox project from the founders of Istio, and it’s genuinely Kubernetes-native — agents are CRDs, every agent is a pod, and it integrates naturally with ArgoCD and GitOps workflows.

The key things that sold me:

Declarative agents — defined as Agent CRDs, reconciled like any other Kubernetes resource
A2A protocol — agents can call other agents directly, which is how the swarm works
MCP tool support — connects to any MCP server, including my ToolHive-managed tools
Native Ollama support — via ModelConfig CRD with provider: Ollama

Fair warning: it’s still CNCF Sandbox, so expect breaking changes. I pinned my Helm chart version and it’s been stable, but don’t blindly upgrade.

The Stack

Talos Linux (5 nodes) + Cilium + ArgoCD
├── kagent (controller + agent pods)
├── Ollama (ollama-agents namespace, cloud model routing)
├── ToolHive (MCP server management)
│   └── azure-devops MCP server (custom image)
└── OpenWebUI + custom Pipe Function

Models are routed through Ollama’s cloud proxy — so kimi-k2.6:cloud, glm-5.1:cloud, etc. are available without running them locally. Handy for agent workloads where you want a capable model without burning VRAM.

Setting Up the Agents

The PO Agent

This was the first real test. I have a detailed system prompt for creating Azure DevOps User Stories — specific formatting, HTML acceptance criteria, working agreement rules, the whole thing. I wanted an agent that behaves exactly like that prompt, every time.

apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: azure-devops
  namespace: kagent
spec:
  description: "Senior PO agent for Azure DevOps User Stories"
  type: Declarative
  declarative:
    modelConfig: ollama-agents-kimi-k2-6-cloud
    systemMessage: |
      You are a structured Senior Product Owner specializing in Azure,
      Kubernetes, and Infrastructure as Code (IaC)...      
    tools:
      - type: McpServer
        mcpServer:
          name: azure-devops
          kind: RemoteMCPServer

The MCP server for Azure DevOps runs via ToolHive as a custom container image, connecting to our Van Oord Azure DevOps organization with a PAT token stored as a Kubernetes secret.

Connecting ToolHive to kagent took some trial and error. The endpoint is not /sse — it’s /mcp with Streamable HTTP:

http://mcp-azure-devops-proxy.toolhive-system:8080/mcp

Once that was wired up correctly, kagent registered 43 Azure DevOps tools automatically.

The Orchestrator

The orchestrator is the interesting part. It doesn’t have any MCP tools — it has other agents as tools. This is kagent’s A2A delegation in action.

tools:
  - type: Agent
    agent:
      apiGroup: kagent.dev
      kind: Agent
      name: azure-devops
      namespace: kagent
  - type: Agent
    agent:
      apiGroup: kagent.dev
      kind: Agent
      name: my-first-k8s-agent
      namespace: kagent

I tried the MCP route first — kagent exposes a /mcp endpoint on the controller with list_agents and invoke_agent tools. But the Google ADK MCP client in the agent pods kept failing to initialize the session:

Failed to create MCP session: unhandled errors in a TaskGroup

Switching to direct A2A tool references solved it immediately. The orchestrator now delegates via A2A without touching the MCP layer.

The system prompt is straightforward — just tell it what each agent does and to not overthink it:

- Analyze the user request and immediately delegate to the correct agent
- Do NOT ask for confirmation before delegating — just do it
- Return the specialist agent's response directly and completely

Connecting to OpenWebUI

kagent doesn’t expose an OpenAI-compatible endpoint for agent chat. It uses A2A, which OpenWebUI doesn’t speak. So I wrote a custom OpenWebUI Pipe Function that translates between the two.

The A2A call structure took some digging. I found it by watching the kagent UI’s network requests in Firefox DevTools:

payload = {
    "jsonrpc": "2.0",
    "method": "message/stream",
    "params": {
        "message": {
            "kind": "message",
            "messageId": message_id,
            "role": "user",
            "parts": [{"kind": "text", "text": user_message}],
            "contextId": context_id,
            "metadata": {"displaySource": "user"},
        },
        "metadata": {},
    },
    "id": str(uuid.uuid4()),
}

The endpoint is the kagent UI service: http://kagent-ui.kagent.svc.local.lab:8080/a2a/kagent/{agent-name}

One gotcha: the stream echoes the user message back before the agent response. Filter on msg.get("role") == "agent" to avoid showing duplicates.

For session continuity across messages in the same chat, OpenWebUI Functions don’t pass a chat_id — so I derive a stable context ID from the first user message using uuid5:

context_id = f"ctx-{uuid.uuid5(uuid.NAMESPACE_DNS, first_message)}"

Not perfect (starting a new conversation with the same first message will share context), but works well enough for the use case.

What It Looks Like

From OpenWebUI, I select Jairix (the orchestrator) and just ask:

“Haal story 78018 op uit Azure DevOps”

Behind the scenes:

Jairix (kimi-k2.6) decides this is an Azure DevOps question
Delegates to azure-devops agent via A2A
azure-devops agent (kimi-k2.6) calls the get_work_item MCP tool
ToolHive proxies to the Azure DevOps API
Response streams back through the chain to OpenWebUI

It’s not fast — there are a lot of hops. But it works, and the agent actually understands the domain because of the system prompt.

The MCP Shortcut

There’s also a faster path: kagent’s /mcp endpoint works fine as an OpenWebUI MCP connection. Add it in Admin → Settings → Tools and any model can call kagent_invoke_agent directly.

The trade-off:

Faster (fewer hops, no orchestrator LLM call)
No system prompt — the agent’s specialized behavior kicks in, but you lose the orchestrator’s routing intelligence
Per-model tool activation — you have to enable the MCP tools for each model in OpenWebUI, which doesn’t scale with 20+ models

For ad-hoc queries it’s great. For consistent, structured output (like User Stories), the Pipe + orchestrator path is better.

Lessons Learned

kagent’s /mcp endpoint exists but is finicky. The KAgentMcpToolset in agent pods fails to initialize MCP sessions reliably. Use direct A2A agent references instead of MCP for agent-to-agent communication.

Ollama serializes requests by default. For a swarm with multiple agents calling the same Ollama instance, set OLLAMA_NUM_PARALLEL. Otherwise requests queue and things feel very slow.

Tool-calling models matter. Small models fail silently on multi-step tool use. I use kimi-k2.6:cloud for the orchestrator and specialists — Kimi K2 is genuinely good at agentic tasks.

CNCF Sandbox means breaking changes. Between v0.6 and v0.7, kagent renamed fields and bumped the API group. Pin your Helm chart version in ArgoCD and review release notes before upgrading.

OpenWebUI Functions don’t pass chat_id. Plan for this from the start if you need session continuity. The uuid5 trick works but isn’t perfect.

What’s Next

Research agent with SearXNG MCP wrapper for web search
Grafana agent via the built-in kagent-grafana-mcp integration
ArgoCD GitOps for all agent CRDs — they’re already tracked but I want proper sync policies
Experiment with kimi-k2-thinking for the orchestrator on complex multi-agent tasks

The foundation is solid. A self-hosted AI swarm, running on bare metal, talking to real enterprise tooling, all declaratively managed in Kubernetes. That’s pretty cool for a homelab.

Running on: Talos Linux 1.x, kagent v0.7.x, Ollama with cloud proxy, OpenWebUI, ToolHive

The Goal¶

Why kagent?¶

The Stack¶

Setting Up the Agents¶

The PO Agent¶

The Orchestrator¶

Connecting to OpenWebUI¶

What It Looks Like¶

The MCP Shortcut¶

Lessons Learned¶

What’s Next¶