Mirecloud building and architechture - Mirecloud Homelab Part 0

BUILDING MIRECLOUD

A Production-Grade Kubernetes Homelab from Scratch

BARE METAL · GITOPS · ZERO TRUST · OBSERVABILITY · LLM

KubernetesCiliumKeycloak VaultArgoCDeBPF OIDCPrometheus

A journey through bare metal, GitOps, OIDC, and the beautiful chaos of running enterprise-grade infrastructure in your living room.

There's a specific kind of madness that grips platform engineers at some point in their career. It usually starts with an innocent thought: "I should have a homelab."

Then you buy one server. Then two. Then you're configuring etcd, arguing with NFS mount options at 1 AM, and explaining to your partner why the internet is down because you're "testing Cilium network policies."

Welcome to MireCloud — my bare-metal Kubernetes homelab, built to mirror what real production infrastructure looks like in 2026, without the comfort of a managed cloud provider to bail you out. This isn't a "spin up a K3s cluster with Docker Desktop" guide. This is the full thing: bare-metal nodes, zero-trust networking, GitOps, SSO, secrets management, observability, and LLM workloads — all running on hardware I own, in a stack I control completely.

The Philosophy: Treat It Like Production

The easiest trap with homelabs is treating them like sandboxes. That approach teaches you sandbox skills. MireCloud is designed around one rule:

If I can't justify it in production, I don't do it here.

►No kubectl apply -f random-internet-yaml.yaml without understanding it
►No hardcoded secrets, ever
►No bypassing RBAC "just to get it working"
►GitOps for everything — if it's not in Git, it doesn't exist

It's a higher bar. It's also why I've learned more from MireCloud than from any course I've ever taken.

Layer 1 — The Foundation: Bare-Metal Kubernetes

MIRECLOUD · 4 MINI PCS + 1 LAPTOP · BARE METAL

The cluster runs on five physical machines — four mini PCs and one laptop, all sitting on my desk rather than in a data center rack. The four mini PCs (16 GB RAM each) form the Kubernetes cluster itself: one dedicated control plane node and three workers. The laptop (64 GB RAM) sits outside the cluster, running VMs for security experiments and isolated lab work.

Why mini PCs? They're power-efficient, quiet, and surprisingly capable. 16 GB per node is tight — which means resource management becomes a real practice, not an afterthought. You learn to write proper requests and limits when ignoring them actually breaks things.

KEY DECISIONS

kubeadm bootstrap · version-pinned upgrades · control plane tainted · kube-bench CIS hardening · NFS-backed PersistentVolumes

Layer 2 — Network & Security: Cilium and eBPF

CILIUM CNI · EBPF NETWORK POLICIES · TETRAGON RUNTIME SECURITY

Cilium is the CNI, doing far more than routing pods. At its core, Cilium uses eBPF — a Linux kernel technology that runs sandboxed programs in the kernel without modifying source or loading modules. Pod networking, kube-proxy replacement, L3/L4 Network Policies, Gateway API, and Hubble observability all run through it.

The jewel on top is Tetragon: runtime security at the kernel level. If a pod tries to write to /etc, spawn unexpected processes, or make suspicious network calls, Tetragon sees it — no sidecar required.

This is the kind of defense-in-depth that large organizations pay significant money to implement. Here, it runs on my hardware, configured in YAML.

Layer 3 — GitOps: ArgoCD App-of-Apps

ARGOCD · APP-OF-APPS · GIT AS SOURCE OF TRUTH

The answer is Git — specifically, ArgoCD watching a repository using the App-of-Apps pattern. The repository is the source of truth. Every application, namespace, Helm release, and ConfigMap is declared in Git. ArgoCD continuously reconciles the cluster's actual state against that desired state. Drift is detected and self-healed automatically.

Helm is the templating engine of choice for applications with multiple configuration surfaces. For simpler manifests, raw Kubernetes YAML committed directly is perfectly fine and often clearer.

Layer 4 — Secrets & Config: Vault + External Secrets Operator

SECRETS NEVER IN GIT · AUTO-ROTATION · AUDIT TRAIL

MireCloud uses HashiCorp Vault as the secrets backend, with the External Secrets Operator (ESO) bridging Vault to Kubernetes. Secrets live in Vault by path, ESO fetches and materializes them as native Kubernetes Secret objects. Applications never know Vault exists.

The result: secrets are never hardcoded, never in Git, and centrally managed. Rotation happens in Vault; ESO propagates the change automatically.

Layer 5 — Identity & SSO: Keycloak, OIDC, Zero Trust

KUBELOGIN · PKCE · RS256 · JWKS · RBAC ENFORCEMENT

Keycloak is the Identity Provider for the entire homelab. Every service — Grafana, ArgoCD, Open WebUI, and the Kubernetes API itself — authenticates through Keycloak via OIDC. When I run kubectl, kubelogin opens a browser, performs a full PKCE auth flow, and gets back a signed JWT. That JWT is verified by the kube-apiserver via JWKS, and the groups claim is enforced through RBAC.

No static credentials. No shared service account tokens. No long-lived secrets for human access. This is what Zero Trust authentication actually looks like at the Kubernetes API level.

Layer 6 — Observability: Prometheus, Grafana, Loki, Promtail

PROMETHEUS · GRAFANA · LOKI · PROMTAIL · AUDITD

You can't operate what you can't see. Prometheus scrapes metrics from every component. Grafana visualizes cluster health, Hubble network flows, and OIDC events — and itself authenticates via Keycloak. Loki aggregates logs; Promtail ships them from nodes and pods.

MireCloud also captures auditd logs from the nodes — system-level audit events forwarded through Promtail into Loki. Combined with Tetragon's kernel-level telemetry, the security observability picture is genuinely comprehensive.

Alerting is not optional, even in a homelab. Prometheus rules for disk pressure, pod crash loops, and Vault seal status have saved me from discovering problems hours too late.

Layer 7 — Applications: Where It All Comes Together

After six layers of infrastructure, the application layer is almost anticlimactic — in the best way. Applications just work, because everything beneath them is solid.

Currently running: Ollama + Open WebUI — a self-hosted LLM stack. Authentication via Keycloak OIDC. Deployment via ArgoCD. Secrets via ESO. Traffic via Cilium. Metrics via Prometheus. This is the compounding return on investing in solid infrastructure.

FULL STACK FLOW PER WORKLOAD

ESO injects secrets → Cilium enforces network policy → Keycloak gates access → ArgoCD manages deployment → Prometheus scrapes metrics → Loki aggregates logs

Honest Reflections: What I'd Change

Single control plane node is a real SPOF. A three-node control plane with etcd quorum is the right architecture. It's on the roadmap.

NFS storage has performance ceilings and consistency edge cases under concurrent writes. Longhorn or Rook-Ceph would be an upgrade.

Tetragon and Cilium are underutilized relative to their capabilities. Layer 7 network policies, Hubble flow exports, and advanced enforcement policies are areas of active learning.

Why Build This?

Because cloud providers abstract away the hard parts, and those abstractions have a cost: you stop understanding what's underneath.

MireCloud forces me to understand every layer. When something breaks — and things break — I can't open a support ticket. I have to read logs, read documentation, read source code if I have to, and figure it out. That's uncomfortable. It's also irreplaceable as a learning environment.

If you're a platform engineer or SRE who wants to develop genuine depth — not just familiarity with managed services — building a homelab like this is one of the highest-leverage investments you can make in your own skills.

The infrastructure is never "done." That's the point.

FastAPI Instrumentalisation with prometheus and grafana Part1 [Counter]

welcome to this hands-on lab on API instrumentation using Prometheus and FastAPI! In the world of modern software development, real-time API monitoring is essential for understanding usage patterns, debugging issues, and ensuring optimal performance. In this lab, we’ll demonstrate how to enhance a FastAPI-based application with Prometheus metrics to monitor its behavior effectively. We’ve already set up the lab environment for you, complete with Grafana, Prometheus, and a PostgreSQL database. While FastAPI’s integration with databases is outside the scope of this lab, our focus will be entirely on instrumentation and monitoring. For those interested in exploring the database integration or testing , you can review the code in our repository: FastAPI Monitoring Repository . What You’ll Learn In this lab, we’ll walk you through: Setting up Prometheus metrics in a FastAPI application. Instrumenting API endpoints to track: Number of requests HTTP methods Request paths Using Grafana to vi...

From Bare Metal to Production-Grade Kubernetes — Real Engineering, No Shortcuts

Search This Blog