I’ve been running a homelab Kubernetes cluster on Talos Linux for a while now, and lately I’ve been diving deep into self-hosted AI. I already had Ollama running, OpenWebUI as my chat interface, and a bunch of local models. But I wanted something more — a proper multi-agent setup where specialized agents handle different tasks automatically, without me manually routing every request.
This is the story of how I built that, what broke along the way, and what actually works.
The Goal
The idea was simple: a swarm of specialized AI agents, each good at one thing, orchestrated by a single entry point. I chat with one interface, and the right agent does the work.
- A DevOps agent that knows my Kubernetes cluster
- A Product Owner agent that creates Azure DevOps User Stories following our team’s working agreement
- An orchestrator that figures out which agent to call
No manual model switching. No copy-pasting context. Just ask, and it happens.
Why kagent?
After evaluating LangGraph, CrewAI, AutoGen, and a few others, I landed on kagent. It’s a CNCF Sandbox project from the founders of Istio, and it’s genuinely Kubernetes-native — agents are CRDs, every agent is a pod, and it integrates naturally with ArgoCD and GitOps workflows.
The key things that sold me:
- Declarative agents — defined as
AgentCRDs, reconciled like any other Kubernetes resource - A2A protocol — agents can call other agents directly, which is how the swarm works
- MCP tool support — connects to any MCP server, including my ToolHive-managed tools
- Native Ollama support — via
ModelConfigCRD withprovider: Ollama
Fair warning: it’s still CNCF Sandbox, so expect breaking changes. I pinned my Helm chart version and it’s been stable, but don’t blindly upgrade.
The Stack
Talos Linux (5 nodes) + Cilium + ArgoCD
├── kagent (controller + agent pods)
├── Ollama (ollama-agents namespace, cloud model routing)
├── ToolHive (MCP server management)
│ └── azure-devops MCP server (custom image)
└── OpenWebUI + custom Pipe Function
Models are routed through Ollama’s cloud proxy — so kimi-k2.6:cloud, glm-5.1:cloud, etc. are available without running them locally. Handy for agent workloads where you want a capable model without burning VRAM.
Setting Up the Agents
The PO Agent
This was the first real test. I have a detailed system prompt for creating Azure DevOps User Stories — specific formatting, HTML acceptance criteria, working agreement rules, the whole thing. I wanted an agent that behaves exactly like that prompt, every time.
apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
name: azure-devops
namespace: kagent
spec:
description: "Senior PO agent for Azure DevOps User Stories"
type: Declarative
declarative:
modelConfig: ollama-agents-kimi-k2-6-cloud
systemMessage: |
You are a structured Senior Product Owner specializing in Azure,
Kubernetes, and Infrastructure as Code (IaC)...
tools:
- type: McpServer
mcpServer:
name: azure-devops
kind: RemoteMCPServer
The MCP server for Azure DevOps runs via ToolHive as a custom container image, connecting to our Van Oord Azure DevOps organization with a PAT token stored as a Kubernetes secret.
Connecting ToolHive to kagent took some trial and error. The endpoint is not /sse — it’s /mcp with Streamable HTTP:
http://mcp-azure-devops-proxy.toolhive-system:8080/mcp
Once that was wired up correctly, kagent registered 43 Azure DevOps tools automatically.
The Orchestrator
The orchestrator is the interesting part. It doesn’t have any MCP tools — it has other agents as tools. This is kagent’s A2A delegation in action.
tools:
- type: Agent
agent:
apiGroup: kagent.dev
kind: Agent
name: azure-devops
namespace: kagent
- type: Agent
agent:
apiGroup: kagent.dev
kind: Agent
name: my-first-k8s-agent
namespace: kagent
I tried the MCP route first — kagent exposes a /mcp endpoint on the controller with list_agents and invoke_agent tools. But the Google ADK MCP client in the agent pods kept failing to initialize the session:
Failed to create MCP session: unhandled errors in a TaskGroup
Switching to direct A2A tool references solved it immediately. The orchestrator now delegates via A2A without touching the MCP layer.
The system prompt is straightforward — just tell it what each agent does and to not overthink it:
- Analyze the user request and immediately delegate to the correct agent
- Do NOT ask for confirmation before delegating — just do it
- Return the specialist agent's response directly and completely
Connecting to OpenWebUI
kagent doesn’t expose an OpenAI-compatible endpoint for agent chat. It uses A2A, which OpenWebUI doesn’t speak. So I wrote a custom OpenWebUI Pipe Function that translates between the two.
The A2A call structure took some digging. I found it by watching the kagent UI’s network requests in Firefox DevTools:
payload = {
"jsonrpc": "2.0",
"method": "message/stream",
"params": {
"message": {
"kind": "message",
"messageId": message_id,
"role": "user",
"parts": [{"kind": "text", "text": user_message}],
"contextId": context_id,
"metadata": {"displaySource": "user"},
},
"metadata": {},
},
"id": str(uuid.uuid4()),
}
The endpoint is the kagent UI service: http://kagent-ui.kagent.svc.local.lab:8080/a2a/kagent/{agent-name}
One gotcha: the stream echoes the user message back before the agent response. Filter on msg.get("role") == "agent" to avoid showing duplicates.
For session continuity across messages in the same chat, OpenWebUI Functions don’t pass a chat_id — so I derive a stable context ID from the first user message using uuid5:
context_id = f"ctx-{uuid.uuid5(uuid.NAMESPACE_DNS, first_message)}"
Not perfect (starting a new conversation with the same first message will share context), but works well enough for the use case.
What It Looks Like
From OpenWebUI, I select Jairix (the orchestrator) and just ask:
“Haal story 78018 op uit Azure DevOps”
Behind the scenes:
- Jairix (kimi-k2.6) decides this is an Azure DevOps question
- Delegates to
azure-devopsagent via A2A azure-devopsagent (kimi-k2.6) calls theget_work_itemMCP tool- ToolHive proxies to the Azure DevOps API
- Response streams back through the chain to OpenWebUI
It’s not fast — there are a lot of hops. But it works, and the agent actually understands the domain because of the system prompt.
The MCP Shortcut
There’s also a faster path: kagent’s /mcp endpoint works fine as an OpenWebUI MCP connection. Add it in Admin → Settings → Tools and any model can call kagent_invoke_agent directly.
The trade-off:
- Faster (fewer hops, no orchestrator LLM call)
- No system prompt — the agent’s specialized behavior kicks in, but you lose the orchestrator’s routing intelligence
- Per-model tool activation — you have to enable the MCP tools for each model in OpenWebUI, which doesn’t scale with 20+ models
For ad-hoc queries it’s great. For consistent, structured output (like User Stories), the Pipe + orchestrator path is better.
Lessons Learned
kagent’s /mcp endpoint exists but is finicky. The KAgentMcpToolset in agent pods fails to initialize MCP sessions reliably. Use direct A2A agent references instead of MCP for agent-to-agent communication.
Ollama serializes requests by default. For a swarm with multiple agents calling the same Ollama instance, set OLLAMA_NUM_PARALLEL. Otherwise requests queue and things feel very slow.
Tool-calling models matter. Small models fail silently on multi-step tool use. I use kimi-k2.6:cloud for the orchestrator and specialists — Kimi K2 is genuinely good at agentic tasks.
CNCF Sandbox means breaking changes. Between v0.6 and v0.7, kagent renamed fields and bumped the API group. Pin your Helm chart version in ArgoCD and review release notes before upgrading.
OpenWebUI Functions don’t pass chat_id. Plan for this from the start if you need session continuity. The uuid5 trick works but isn’t perfect.
What’s Next
- Research agent with SearXNG MCP wrapper for web search
- Grafana agent via the built-in
kagent-grafana-mcpintegration - ArgoCD GitOps for all agent CRDs — they’re already tracked but I want proper sync policies
- Experiment with kimi-k2-thinking for the orchestrator on complex multi-agent tasks
The foundation is solid. A self-hosted AI swarm, running on bare metal, talking to real enterprise tooling, all declaratively managed in Kubernetes. That’s pretty cool for a homelab.
Running on: Talos Linux 1.x, kagent v0.7.x, Ollama with cloud proxy, OpenWebUI, ToolHive