Hey everyone! Just wrapped up an awesome weekend project and wanted to share what I built.

TL;DR: I can now deploy Azure infrastructure by just chatting with an AI - all running locally with open-source models.

What I Built

An AI-powered system where I can literally say “Create an AKS cluster in Azure with 3 nodes” in chat, and it actually deploys real infrastructure. No manual Terraform commands needed! And everything runs locally - no OpenAI/Anthropic API calls needed!

Live demo workflow I just ran:

  • Generated Terraform code for Azure Storage Account
  • Validated and created execution plan
  • Listed all workspaces
  • Applied changes to Azure (deployed real infrastructure!)
  • Updated storage tier from Hot to Cool (in-place update!)
  • Cleaned up empty workspaces
  • Destroyed infrastructure when done

All through natural conversation. All with local models. All in under 10 minutes.

The Stack

Infrastructure Layer

  • Talos Kubernetes cluster (self-hosted)
  • Docker Registry for custom images
  • Azure trial tenant (for safe experimentation)

AI & Orchestration

  • OpenWebUI (locally hosted, self-hosted AI interface)
  • Ollama with gpt-oss:120b-cloud (120B parameter model!)
  • Terraform MCP Server (deployed in K8s cluster)
  • MCPO (MCP Orchestrator in K8s)

Custom Build - Terraform Executor API (Python Flask)

  • Validates TF code
  • Generates execution plans
  • Handles apply/destroy operations
  • State management
  • Workspace persistence across restarts

Deployment Tools

  • OpenWebUI Custom Tools (6 functions for full TF lifecycle)
  • Ingress with proper timeouts (nginx)
  • Azure Service Principal for auth

How It Works

Here’s the complete flow:

Step 1: Me: “Deploy a storage account in Azure”

Step 2: Ollama (gpt-oss:120b) uses Terraform MCP to get Azure provider docs

Step 3: AI generates Terraform code

Step 4: OpenWebUI Tool calls Terraform Executor API

Step 5: Executor runs: terraform init, validate, plan

Step 6: AI shows me the plan: “This will create 2 resources…”

Step 7: Me: “Apply it!”

Step 8: Executor runs: terraform apply

Step 9: LIVE INFRASTRUCTURE IN AZURE!

Actual Demo Workflow

Here’s what I demonstrated step by step:

Step 1: Generate & Prepare

Me: “Create Terraform code for an Azure Storage Account named ’tftest20241011’ in West Europe with hot access tier and LRS replication, then prepare it for deployment. Create workspace name ‘demo-kubecraft-3’ and display the plan.”

AI Response:

  • Generated complete Terraform code
  • Ran terraform init
  • Validated configuration
  • Generated execution plan
  • Plan: 2 to add, 0 to change, 0 to destroy

Step 2: List Workspaces

Me: “Can you list all my workspaces?”

AI presented a beautiful table:

#WORKSPACE IDHAS STATE?HAS PLAN?TERRAFORM FILES
1demo-kubecraftNoYesmain.tf
2demo-kubecraft-2NoYesmain.tf
3demo-kubecraft-3NoYesmain.tf

The AI even suggested next actions:

  • View details of a specific workspace (terraform_get_workspace_info)
  • Apply a plan (terraform_apply)
  • Refresh a workspace with new code (terraform_refresh)
  • Destroy any of these workspaces (terraform_destroy)

Thought time: Less than a second!

Step 3: Apply Infrastructure

Me: “Please apply workspace ‘demo-kubecraft-3’”

AI Response:

  • Terraform Apply Successful!
  • Apply complete! Resources: 2 added, 0 changed, 0 destroyed
  • Resource Group created in Azure
  • Storage Account deployed with Hot tier

Step 4: Cleanup Empty Workspaces

Me: “Please destroy workspaces that have ‘No state’”

AI Response:

  • Identified and destroyed 2 empty workspaces
  • Cleaned up unused resources

Step 5: Update Configuration

Me: “Update the storage account to use Cool access tier instead of Hot in workspace_id ‘demo-kubecraft-3’”

AI Response:

  • Refreshed workspace with new configuration
  • Plan: 0 to add, 1 to change, 0 to destroy
  • Change detected: access_tier: “Hot” to “Cool”

Step 6: Apply Update

Me: “Please apply workspace ‘demo-kubecraft-3’”

AI Response:

  • In-place update successful!
  • Storage account updated without destruction
  • Zero downtime during update

Step 7: Final Cleanup

Me: “Destroy workspace ‘demo-kubecraft-3’”

AI Response:

  • Terraform Destroy Successful
  • Destroy complete! Resources: 2 destroyed
  • All infrastructure cleaned up from Azure

Key Features

Full Terraform Lifecycle - 6 Custom Tools

  • terraform_prepare - Validate & generate plan
  • terraform_apply - Deploy infrastructure
  • terraform_destroy - Clean up resources
  • terraform_list_workspaces - View all active workspaces
  • terraform_get_workspace_info - Inspect workspace details
  • terraform_refresh - Update existing infra (keeps state!)

Production-Ready Elements

  • State persistence (survives pod restarts)
  • Workspace management
  • Proper error handling
  • Timeout handling for long-running operations
  • Azure credential management
  • In-place updates (no destroy/recreate needed)

100% Local & Open Source

  • gpt-oss:120b-cloud via Ollama (no API costs!)
  • Also tested with qwen3-coder:480b-cloud
  • Terraform MCP Server running in-cluster
  • Self-hosted OpenWebUI
  • Everything in your own Kubernetes cluster

Architecture Highlights

Terraform MCP Server in Cluster

Instead of relying on external APIs, I deployed the Terraform MCP server directly in my Kubernetes cluster as a separate service with StreamableHTTP transport.

Benefits:

  • Fast access to Terraform docs (no external API calls)
  • Stays in your cluster (no data leaves your network)
  • Always up-to-date provider documentation
  • Sub-second response times

This gives the AI real-time access to:

  • Latest Terraform provider versions
  • Module documentation
  • Best practices
  • Resource schemas

Challenges Solved

Challenge 1: State Management

  • Problem: Workspaces lost after container restart
  • Solution: Scan disk on startup, rebuild in-memory index

Challenge 2: Ingress Timeouts

  • Problem: 504 errors during terraform apply
  • Solution: Nginx annotations for 10-minute timeouts

Challenge 3: Model Compatibility

  • Problem: DeepSeek models crash on parameterless functions
  • Solution: Added dummy parameters for universal model support
  • Winner: gpt-oss:120b-cloud worked perfectly!

Challenge 4: MCP Integration

  • Problem: MCPO couldn’t talk to HTTP MCP servers initially
  • Solution: Deployed Terraform MCP as separate K8s service with StreamableHTTP

Challenge 5: Workspace Refresh (In-Place Updates)

  • Problem: Original design would destroy/recreate on changes
  • Solution: Added terraform_refresh endpoint that keeps state
  • Result: Can update storage tier without destroying the account!

Time Investment

Totally worth it!

Model Performance Notes

gpt-oss:120b-cloud (WINNER)

  • Excellent tool calling
  • Understands complex Terraform concepts
  • Fast inference via Ollama
  • Best overall experience

qwen3-coder:480b-cloud (SOLID)

  • Great for code generation
  • Solid tool calling support
  • Massive parameter count pays off

deepseek-v3.1:671b-cloud (QUIRKY)

  • Issues with parameterless functions
  • Works after adding dummy params
  • Still usable but quirky

What’s Next?

Possible improvements:

  • N8N integration for approval workflows
  • Cost estimation before apply
  • Terraform state backend (Azure Storage)
  • Drift detection
  • Auto-documentation of deployed resources

Key Takeaways

  1. Local models are production-ready - 120B models rival GPT-4 for this use case

  2. MCP in-cluster = fast - No external API latency

  3. OpenWebUI is underrated - Self-hosted, flexible, great for custom tools

  4. Kubernetes makes everything easier - Deploy once, scale forever

  5. State management matters - Workspace persistence was crucial

  6. In-place updates save time - Refresh beats destroy/recreate

  7. Weekend projects are the best - Built something that works!

Cost Analysis

Traditional approach (with GPT-4 API)

  • Around $0.01 per request
  • 7 operations in demo = $0.07
  • Monthly usage: Around $20-50

My approach (all local)

  • $0 per request
  • Only cost: electricity for GPU/CPU
  • One-time hardware investment

Tech Stack Summary

Frontend:

  • OpenWebUI (self-hosted)

AI:

  • Ollama with gpt-oss:120b-cloud / qwen3-coder:480b-cloud

MCP:

  • Terraform MCP Server (in-cluster)

Backend:

  • Python Flask API

IaC:

  • Terraform 1.10

Cloud:

  • Azure (trial tenant)

Infrastructure:

  • Talos Kubernetes
  • Docker + private registry
  • Nginx with extended timeouts

Want To Try This?

All components are open source:

The custom Terraform Executor is straightforward:

  • Python Flask API
  • Terraform CLI wrapper
  • Basic state management
  • Docker container (around 100 lines of Python)

Resources

Models Used

  • gpt-oss:120b-cloud (primary)
  • qwen3-coder:480b-cloud (alternative)

Talos Resources

  • OpenWebUI: Around 2GB RAM
  • MCPO: Around 512MB RAM
  • Terraform MCP: Around 256MB RAM
  • Terraform Executor: Around 512MB RAM
  • Ollama: 24GB VRAM (for 120B model)

Total cluster footprint: Around 3.5GB RAM + 24GB VRAM

Closing Thoughts

Want to see a demo? Drop a comment!

This was a fun weekend hack that turned into something genuinely useful. Infrastructure as conversation with 100% local models is wild!

No API keys. No cloud AI services. Just open-source models and Kubernetes.

This is the future of DevOps.


Built with coffee, lo-fi beats, and local GPU power during a rainy weekend.


Last updated: October 11, 2025