There's a specific kind of madness that grips platform engineers at some point in their career. It usually starts with an innocent thought: "I should have a homelab."
Then you buy one server. Then two. Then you're configuring etcd, arguing with NFS mount options at 1 AM, and explaining to your partner why the internet is down because you're "testing Cilium network policies."
Welcome to MireCloud — my bare-metal Kubernetes homelab, built to mirror what real production infrastructure looks like in 2026, without the comfort of a managed cloud provider to bail you out. This isn't a "spin up a K3s cluster with Docker Desktop" guide. This is the full thing: bare-metal nodes, zero-trust networking, GitOps, SSO, secrets management, observability, and LLM workloads — all running on hardware I own, in a stack I control completely.
The Philosophy: Treat It Like Production
The easiest trap with homelabs is treating them like sandboxes. That approach teaches you sandbox skills. MireCloud is designed around one rule:
- ►No
kubectl apply -f random-internet-yaml.yamlwithout understanding it - ►No hardcoded secrets, ever
- ►No bypassing RBAC "just to get it working"
- ►GitOps for everything — if it's not in Git, it doesn't exist
It's a higher bar. It's also why I've learned more from MireCloud than from any course I've ever taken.
Layer 1 — The Foundation: Bare-Metal Kubernetes
The cluster runs on five physical machines — four mini PCs and one laptop, all sitting on my desk rather than in a data center rack. The four mini PCs (16 GB RAM each) form the Kubernetes cluster itself: one dedicated control plane node and three workers. The laptop (64 GB RAM) sits outside the cluster, running VMs for security experiments and isolated lab work.
Why mini PCs? They're power-efficient, quiet, and surprisingly capable. 16 GB per node is tight — which means resource management becomes a real practice, not an afterthought. You learn to write proper requests and limits when ignoring them actually breaks things.
Layer 2 — Network & Security: Cilium and eBPF
Cilium is the CNI, doing far more than routing pods. At its core, Cilium uses eBPF — a Linux kernel technology that runs sandboxed programs in the kernel without modifying source or loading modules. Pod networking, kube-proxy replacement, L3/L4 Network Policies, Gateway API, and Hubble observability all run through it.
The jewel on top is Tetragon: runtime security at the kernel level. If a pod tries to write to /etc, spawn unexpected processes, or make suspicious network calls, Tetragon sees it — no sidecar required.
Layer 3 — GitOps: ArgoCD App-of-Apps
The answer is Git — specifically, ArgoCD watching a repository using the App-of-Apps pattern. The repository is the source of truth. Every application, namespace, Helm release, and ConfigMap is declared in Git. ArgoCD continuously reconciles the cluster's actual state against that desired state. Drift is detected and self-healed automatically.
Helm is the templating engine of choice for applications with multiple configuration surfaces. For simpler manifests, raw Kubernetes YAML committed directly is perfectly fine and often clearer.
Layer 4 — Secrets & Config: Vault + External Secrets Operator
MireCloud uses HashiCorp Vault as the secrets backend, with the External Secrets Operator (ESO) bridging Vault to Kubernetes. Secrets live in Vault by path, ESO fetches and materializes them as native Kubernetes Secret objects. Applications never know Vault exists.
The result: secrets are never hardcoded, never in Git, and centrally managed. Rotation happens in Vault; ESO propagates the change automatically.
Layer 5 — Identity & SSO: Keycloak, OIDC, Zero Trust
Keycloak is the Identity Provider for the entire homelab. Every service — Grafana, ArgoCD, Open WebUI, and the Kubernetes API itself — authenticates through Keycloak via OIDC. When I run kubectl, kubelogin opens a browser, performs a full PKCE auth flow, and gets back a signed JWT. That JWT is verified by the kube-apiserver via JWKS, and the groups claim is enforced through RBAC.
Layer 6 — Observability: Prometheus, Grafana, Loki, Promtail
You can't operate what you can't see. Prometheus scrapes metrics from every component. Grafana visualizes cluster health, Hubble network flows, and OIDC events — and itself authenticates via Keycloak. Loki aggregates logs; Promtail ships them from nodes and pods.
MireCloud also captures auditd logs from the nodes — system-level audit events forwarded through Promtail into Loki. Combined with Tetragon's kernel-level telemetry, the security observability picture is genuinely comprehensive.
Layer 7 — Applications: Where It All Comes Together
After six layers of infrastructure, the application layer is almost anticlimactic — in the best way. Applications just work, because everything beneath them is solid.
Currently running: Ollama + Open WebUI — a self-hosted LLM stack. Authentication via Keycloak OIDC. Deployment via ArgoCD. Secrets via ESO. Traffic via Cilium. Metrics via Prometheus. This is the compounding return on investing in solid infrastructure.
Honest Reflections: What I'd Change
Single control plane node is a real SPOF. A three-node control plane with etcd quorum is the right architecture. It's on the roadmap.
NFS storage has performance ceilings and consistency edge cases under concurrent writes. Longhorn or Rook-Ceph would be an upgrade.
Tetragon and Cilium are underutilized relative to their capabilities. Layer 7 network policies, Hubble flow exports, and advanced enforcement policies are areas of active learning.
Why Build This?
Because cloud providers abstract away the hard parts, and those abstractions have a cost: you stop understanding what's underneath.
MireCloud forces me to understand every layer. When something breaks — and things break — I can't open a support ticket. I have to read logs, read documentation, read source code if I have to, and figure it out. That's uncomfortable. It's also irreplaceable as a learning environment.
The infrastructure is never "done." That's the point.
Comments
Post a Comment