BLOG POST

I Let AI Build My Homelab So I Wouldn't Have To (sorta)

A annecdote of how AI Agents helped to setup a homelab, with some observations and learnings on the state of AI in Dev Ops

The Internet says AI can do Dev Ops

Programmers probably yearn to touch more grass than they currently are (while non-programmers seem to want to touch less grass and write more code).

Many Homelabber on the web put out videos claiming that they’ve managed to get AI to handle the dev ops for their home servers. They speak of a world where these agents will finally free you from doing oncall for your own homelab, and give you more time to go out.

It’s no longer just the simple and replicable tasks like “Use RenovateBot to raise MRs to patch new CVEs”. Take a look at the AWS DevOps agent — It is capable enough to respond and mitigate certain kinds of incidents, perform RCA, and help with infrastructure.

The Appeal of an AI Dev Ops Agent

Honestly, I can see the appeal of delegating oncall work to an AI Agent.

They work nearly 24/7 (subjected to your infra SLAs), and can respond more fluidly during an incident compared to traditional rule engines. Many AI-friendly companies have also started using AI agents to manage infra. For them, the value proposition they want to communicate is very clear:

Humans can train autonomous agents to follow runbooks and methodically recover systems under high-stress.

There is potential for Agents to more reliably manage outages, execute rollbacks, and push configuration fixes than Humans.

But a LLM is still a probabilistic engine that dreams of the most likely next state grounded on all the tokens currently held in its context window. LLM hallucinations are still a thing. And when it happens, the outcomes are spectacularly bad (think Meta Chief Safety Officer …).

Instead of just hypothesizing about what AI can or cannot do, I decided to get some front row seats on AI in DevOps.


An experiment with homelab setup

By sheer coincidence (really), I recently bought a GMKTech mini-pc for my homelab.

The intial plan was to setup a home DNS for ad-blocking, and to tinker with Home Assistant docker to replace a Aqara hub that I had bought overseas. However, I though this was the perfect setup to watch AI do its thing (and for me to dissect and learn).

Scope of the initial homelab setup:

  1. Technitium - internal DNS
  2. Gitea - self-hosted git
  3. Grafana - dashboards
  4. Alloy - Open-Telemetry collector
  5. Loki - log storage
  6. Tempo - tracing backend
  7. Prometheus - metrics backend

(Btw, I didn’t add Mimir as I am still thinking about what object store I should use)


Deploying a DNS container with AI

As a complete infrastructure newbie, my learning plan was really simple:

  1. Research on a DNS to deploy
  2. Prime context with dev ops knowledge through conversation
  3. Get Claude to write a functioning docker compose with comments
  4. Execute the deployment myself to familiarise with docker

As expected, Claude was able to one-shot a minimal, functioning docker-compose.yaml that supports many features that Claude surfaced and discussed with me in step 2.

This included :

  1. Proper port bindings to support various flavours of DNS (UDP, TCP, TLS, HTTPS)
  2. Correctly setting up the web admin dashboard
  3. Functioning tests for me to verify DNS resolution from other PCs on my network, and via my router’s wireguard tunnel

So far so good.

Hiding my secrets

You really can’t keep secrets from Claude without serious setup.

Docker compose was easy to set up for technitium. There were no secrets required, and configurations could be found on GitHub for easy reference.

For Gitea, I needed to inject secrets for my admin user. Claude told me to mount secrets from local directories and to add deny reads for these secrets in .claudeignore and in claude’s local settings, but I have seen it circumvent these rules so many times.

People are conflicted on whether this is a feature or a bug. But the fact is that the User of the tool needs to design the dev environment to prevent leaking secrets.

To achieve this, I only had a few options in mind from my work experience:

  1. Running docker swarm to provision secrets - which was a terrible dev experience
  2. Deploying a vault like Hashicorp - which was too overkill that I didn’t event start
  3. Somehow using the 1Password setup that I already had

If only I had an “expert” whom I could consult with.

Simulating expert discussions with Gemini

The idea is to let Gemini talk to itself so that it spills all its training data on existing documents, articles, opinions written on Security and DevOps.

It was surprisingly easy to prime Gemini with the context tokens needed for it to sound like a hardcore security engineer.

Simulate a multiturn conversation between between Sterling (the Hardcore Security Engineer), and Milo (the practical Dev Ops). 

Milo and Sterling should talk back and forth for at least 3 turns discussing how to setup a secure homelab. Sterling always proposes the strictest security standards, and Milo should negotiate for a practical, maintainable compromise. 

There are a few things Milo already wants to setup: a DNS, Gitea, and Grafana for observability. He already uses 1password and has easy access to wireguard from his router. 

He also wants something that is not docker compose, but not as complex as k8s (but still useful to learn transferable skills on kubernetes).

Milo should explicitly articulate out the things he will incorporate in his homelab, and Sterling should voice out which compromise makes sense and challenge other proposals.

Start the conversation as follows:

Milo's Persona: ...
Sterling Persona: ...

Thanks for reaching out. I read a lot about your work in secure homelab setups.
<Continue>

The generated conversation was absolutely fascinating.

Milo started the conversation talking about leaning towards K3s, and Sterling proposed Talos for its security benefits. Conversations spanned across so many new concepts like Network Policies, VLAN separation, ACL, Secret encryption at rest, Bitnami charts, mTLS, service mesh, and so on. You can even inject “opinions” into either of the persona mid-conversation, and then continue the discussion in that direction.

The negotiations between Milo and Sterling itself was so rich in new concepts and I was able to further my research into specific things that made sense to me.

My Homelab Design Goals (feat. Gemini)

From that conversation, I distilled the following as my practical homelab guidelines:

  1. K3s foundation. Lightweight kubernetes for the homelab.
  2. Helmfile from day one. Great foundation for managing infra through code.
  3. Rootless most of the time. No containers running as root, unless absolutely necessary.
  4. Secrets never on the disk. Full Secret Ops via 1Password Connect and Kubernetes operators.

With this design goals in mind, I set out to use Claude to generate my helmfile releases, and custom charts.


Where LLMs helped

Domain knowledge, applied to my setup. The LLM had a surprisingly broad knowledge of DevOps and Security knowledge (I was on Gemini fast mode). More importantly, it could apply this knowledge to my specific constraints.

The persona debate technique. Asking an LLM to embody two opposing expert perspectives seems to be a great way to extract structured knowledge on a topic. I got far more understanding on what a secure homelab setup needs than from any YouTube tutorial on homelab setup.

Reasonable default setups. The docker compose setup was smooth. Initial helm chart scaffolding was also smooth. The problem comes later, when trying to customize helm chart values precisely for my needs

Where LLMs struggled

Bitnami’s deprecation blindspot. My rootless design goal nudged the LLM toward Bitnami charts which were rootless by default. Claude happily produced Helm values for a whole set of services using Bitnami charts, and everything looked coherent to me as a newbie.

This was early in the setup, and I was still unfamiliar with debugging pods directly with kubectl, so I had to rely on Claude.

This was probably a terrible decision in hindsight because th real issue was the Bitnami had recently stop open sourcing their containers. So much time was wasted because Claude zoomed in onto what could have been wrong with the values.yaml.

Hallucinated Helm values. Another common problem was producing values.yaml configurations that looked plausible enough to a newbie, but didn’t actually exist in the real charts. Some concrete ones include using podSecurityContext when it’s not in the chart, and incorrect traefik ingress setups.

I did try to ask Claude and Gemini to explain what it thought had happened. Both responses often spoke about not being able to produce specifics, but it could comfortably provide high-level configuration groups commonly used in vendors like Bitnami and Chaingaurd.

This is solvable though. Vendoring the official example values from the chart made Claude’s config generation significantly better.

Lack of deployment feedback loop Claude without any DevOps tools felt like an eager consultant that produced no

There were 2 big issues created by Claude from the way it deployed in my homelab.

  1. The proposed initial setup of the Helmfile did not use atomic releases. There had been so many rounds of changes from fixing the hallucinations mentioned above. Some resources were deleted entirely while others were added. All of a sudden, my cluster was full of silently crashing pods and many orphaned resources.

  2. Claude wasn’t wired into the deployment process. Hence, there was no proactive use of kubectl to investigate pod logs, check pod status, etc. Charts generated by Claude also did not contain validation pods so there were occasionally pods that were running but not functional.


I finally have my Homelab!

Through this experience, I landed on a few key context documents to anchor Claude’s help in homelab setup:

  1. secrets.md: an exhaustive list of secrets expected to be available in Kubernetes Secrets. This helps define what secrets are expected to be accessible to without having to actually read the values.
  2. environments/...: standardized environment values that declares shared data like domain, user ids, etc.
  3. values.yaml.gotmpl: helmfile go-templating as the main customization engine using the standardized environment values
  4. values.example.yaml: helm values that are vendored into the repo to ground Claude’s config generation.

Everything I set out to deploy is running approximately 2 weeks after I got my design goals (total 3 weeks from project start). I even managed to setup Homepage for a cool app homepage dashboard.

There are a still a few TODOs including decluttering my Grafana dashboards, Home Assistant cpmtaomer, and an LLM agent layer I’ll write about separately.

The homelab repo is on GitHub if you want to see my setup.

So, can AI setup a K3s homelab?

Perhaps, if there were more consumer friendly ways for an Agent to understand the state of my k3s homelab.

These LLMs can talk the language of DevOps, but applying them to your average consumer homelab takes too much setup. The context files can help reduce hallucinations but it doesn’t eliminate them.

I wouldn’t be surprise that better tools appear maybe, in about 6 months. AWS has done it with their out-of-the-box DevOps agents, so open source tooling would hopefully have a northstar to chase. By then, the harder problem would be deciding who is accountable if an Agent fails catastrophically.

For now, I find pairing with AI and effective way to setup my homelab infrastructure. And that’s already more than I expected.