Skip to main content

Mirecloud building and architechture - Mirecloud Homelab Part 0

BUILDING MIRECLOUD
A Production-Grade Kubernetes Homelab from Scratch
BARE METAL · GITOPS · ZERO TRUST · OBSERVABILITY · LLM
KubernetesCiliumKeycloak VaultArgoCDeBPF OIDCPrometheus
A journey through bare metal, GitOps, OIDC, and the beautiful chaos of running enterprise-grade infrastructure in your living room.

There's a specific kind of madness that grips platform engineers at some point in their career. It usually starts with an innocent thought: "I should have a homelab."

Then you buy one server. Then two. Then you're configuring etcd, arguing with NFS mount options at 1 AM, and explaining to your partner why the internet is down because you're "testing Cilium network policies."

Welcome to MireCloud — my bare-metal Kubernetes homelab, built to mirror what real production infrastructure looks like in 2026, without the comfort of a managed cloud provider to bail you out. This isn't a "spin up a K3s cluster with Docker Desktop" guide. This is the full thing: bare-metal nodes, zero-trust networking, GitOps, SSO, secrets management, observability, and LLM workloads — all running on hardware I own, in a stack I control completely.


The Philosophy: Treat It Like Production

The easiest trap with homelabs is treating them like sandboxes. That approach teaches you sandbox skills. MireCloud is designed around one rule:

If I can't justify it in production, I don't do it here.
  • No kubectl apply -f random-internet-yaml.yaml without understanding it
  • No hardcoded secrets, ever
  • No bypassing RBAC "just to get it working"
  • GitOps for everything — if it's not in Git, it doesn't exist

It's a higher bar. It's also why I've learned more from MireCloud than from any course I've ever taken.


Layer 1 — The Foundation: Bare-Metal Kubernetes

CLUSTER TOPOLOGY 🖥️ node-01 Control Plane 16 GB 🖥️ node-02 · Worker 16 GB 🖥️ node-03 · Worker 16 GB 🖥️ node-04 · Worker 16 GB ⇄ SW 💻 Laptop 64 GB · VM HOST Kali · VMs · Labs isolated segment Control Plane Worker Node VM Host (isolated) LAN Switch
MIRECLOUD · 4 MINI PCS + 1 LAPTOP · BARE METAL

The cluster runs on five physical machines — four mini PCs and one laptop, all sitting on my desk rather than in a data center rack. The four mini PCs (16 GB RAM each) form the Kubernetes cluster itself: one dedicated control plane node and three workers. The laptop (64 GB RAM) sits outside the cluster, running VMs for security experiments and isolated lab work.

Why mini PCs? They're power-efficient, quiet, and surprisingly capable. 16 GB per node is tight — which means resource management becomes a real practice, not an afterthought. You learn to write proper requests and limits when ignoring them actually breaks things.

KEY DECISIONS
kubeadm bootstrap · version-pinned upgrades · control plane tainted · kube-bench CIS hardening · NFS-backed PersistentVolumes

Layer 2 — Network & Security: Cilium and eBPF

CILIUM · eBPF MESH eBPF KERNEL LAYER XDP Hook TC Hook Sock Filter Trace Probe Audit Tetragon BPF Map 📦pod-frontend 📦pod-backend 📦pod-keycloak 📦pod-vault ✓ allowed ✓ allowed ✓ allowed ✕ blocked NetworkPolicy deny Tetragon ring
CILIUM CNI · EBPF NETWORK POLICIES · TETRAGON RUNTIME SECURITY

Cilium is the CNI, doing far more than routing pods. At its core, Cilium uses eBPF — a Linux kernel technology that runs sandboxed programs in the kernel without modifying source or loading modules. Pod networking, kube-proxy replacement, L3/L4 Network Policies, Gateway API, and Hubble observability all run through it.

The jewel on top is Tetragon: runtime security at the kernel level. If a pod tries to write to /etc, spawn unexpected processes, or make suspicious network calls, Tetragon sees it — no sidecar required.

This is the kind of defense-in-depth that large organizations pay significant money to implement. Here, it runs on my hardware, configured in YAML.

Layer 3 — GitOps: ArgoCD App-of-Apps

GITOPS · APP-OF-APPS PATTERN 📝 Git Repo source of truth 🔄 ArgoCD continuous reconcile Kubernetes Cluster 📦App A · synced 📦App B · synced 📦App C · synced 📦App D · synced desired = actual auto-reconciled ∞
ARGOCD · APP-OF-APPS · GIT AS SOURCE OF TRUTH

The answer is Git — specifically, ArgoCD watching a repository using the App-of-Apps pattern. The repository is the source of truth. Every application, namespace, Helm release, and ConfigMap is declared in Git. ArgoCD continuously reconciles the cluster's actual state against that desired state. Drift is detected and self-healed automatically.

Helm is the templating engine of choice for applications with multiple configuration surfaces. For simpler manifests, raw Kubernetes YAML committed directly is perfectly fine and often clearer.


Layer 4 — Secrets & Config: Vault + External Secrets Operator

VAULT · ESO · SECRETS PIPELINE 🔐 HashiCorp Vault 🔑 db-password 🔑 oidc-secret 🔑 tls-cert fetch ⚙️ ESO ExternalSecret ctrl SA token auth sync ☸️ K8s Secret native object auto-rotated
SECRETS NEVER IN GIT · AUTO-ROTATION · AUDIT TRAIL

MireCloud uses HashiCorp Vault as the secrets backend, with the External Secrets Operator (ESO) bridging Vault to Kubernetes. Secrets live in Vault by path, ESO fetches and materializes them as native Kubernetes Secret objects. Applications never know Vault exists.

The result: secrets are never hardcoded, never in Git, and centrally managed. Rotation happens in Vault; ESO propagates the change automatically.


Layer 5 — Identity & SSO: Keycloak, OIDC, Zero Trust

OIDC AUTH FLOW — ZERO TRUST 👤 Developer CLIENT 🌐 Browser AGENT 🔑 Keycloak IDENTITY PROVIDER ☸️ kube-apiserver CONTROL PLANE ① oidc-login ② Auth Req ③ Auth Code ⑤ ID Token JWT (RS256) ⑥ Bearer JWT → kube-apiserver → JWKS verify → RBAC
KUBELOGIN · PKCE · RS256 · JWKS · RBAC ENFORCEMENT

Keycloak is the Identity Provider for the entire homelab. Every service — Grafana, ArgoCD, Open WebUI, and the Kubernetes API itself — authenticates through Keycloak via OIDC. When I run kubectl, kubelogin opens a browser, performs a full PKCE auth flow, and gets back a signed JWT. That JWT is verified by the kube-apiserver via JWKS, and the groups claim is enforced through RBAC.

No static credentials. No shared service account tokens. No long-lived secrets for human access. This is what Zero Trust authentication actually looks like at the Kubernetes API level.

Layer 6 — Observability: Prometheus, Grafana, Loki, Promtail

OBSERVABILITY STACK 📈 Prometheus metrics scrape 📊 Grafana dashboards · OIDC 📋 Loki log aggregation 🚀 Promtail log shipper auditd host audit logs auditd → Promtail → Loki → Grafana
PROMETHEUS · GRAFANA · LOKI · PROMTAIL · AUDITD

You can't operate what you can't see. Prometheus scrapes metrics from every component. Grafana visualizes cluster health, Hubble network flows, and OIDC events — and itself authenticates via Keycloak. Loki aggregates logs; Promtail ships them from nodes and pods.

MireCloud also captures auditd logs from the nodes — system-level audit events forwarded through Promtail into Loki. Combined with Tetragon's kernel-level telemetry, the security observability picture is genuinely comprehensive.

Alerting is not optional, even in a homelab. Prometheus rules for disk pressure, pod crash loops, and Vault seal status have saved me from discovering problems hours too late.

Layer 7 — Applications: Where It All Comes Together

After six layers of infrastructure, the application layer is almost anticlimactic — in the best way. Applications just work, because everything beneath them is solid.

Currently running: Ollama + Open WebUI — a self-hosted LLM stack. Authentication via Keycloak OIDC. Deployment via ArgoCD. Secrets via ESO. Traffic via Cilium. Metrics via Prometheus. This is the compounding return on investing in solid infrastructure.

FULL STACK FLOW PER WORKLOAD
ESO injects secrets → Cilium enforces network policy → Keycloak gates access → ArgoCD manages deployment → Prometheus scrapes metrics → Loki aggregates logs

Honest Reflections: What I'd Change

Single control plane node is a real SPOF. A three-node control plane with etcd quorum is the right architecture. It's on the roadmap.

NFS storage has performance ceilings and consistency edge cases under concurrent writes. Longhorn or Rook-Ceph would be an upgrade.

Tetragon and Cilium are underutilized relative to their capabilities. Layer 7 network policies, Hubble flow exports, and advanced enforcement policies are areas of active learning.


Why Build This?

Because cloud providers abstract away the hard parts, and those abstractions have a cost: you stop understanding what's underneath.

MireCloud forces me to understand every layer. When something breaks — and things break — I can't open a support ticket. I have to read logs, read documentation, read source code if I have to, and figure it out. That's uncomfortable. It's also irreplaceable as a learning environment.

If you're a platform engineer or SRE who wants to develop genuine depth — not just familiarity with managed services — building a homelab like this is one of the highest-leverage investments you can make in your own skills.

The infrastructure is never "done." That's the point.

Comments

Popular posts from this blog

FastAPI Instrumentalisation with prometheus and grafana Part1 [Counter]

welcome to this hands-on lab on API instrumentation using Prometheus and FastAPI! In the world of modern software development, real-time API monitoring is essential for understanding usage patterns, debugging issues, and ensuring optimal performance. In this lab, we’ll demonstrate how to enhance a FastAPI-based application with Prometheus metrics to monitor its behavior effectively. We’ve already set up the lab environment for you, complete with Grafana, Prometheus, and a PostgreSQL database. While FastAPI’s integration with databases is outside the scope of this lab, our focus will be entirely on instrumentation and monitoring. For those interested in exploring the database integration or testing , you can review the code in our repository: FastAPI Monitoring Repository . What You’ll Learn In this lab, we’ll walk you through: Setting up Prometheus metrics in a FastAPI application. Instrumenting API endpoints to track: Number of requests HTTP methods Request paths Using Grafana to vi...

Join Ubuntu 20.04 to Active Directory with SSSD and SSH Access

Join Ubuntu 20.04 to Active Directory with SSSD and SSH Access  Overview This guide walks you through joining an Ubuntu 20.04 machine to an Active Directory domain using SSSD, configuring PAM for AD user logins over SSH, and enabling automatic creation of home directories upon first login. We’ll also cover troubleshooting steps and verification commands. Environment Used Component Value Ubuntu Client       ubuntu-client.bazboutey.local Active Directory FQDN   bazboutey.local Realm (Kerberos)   BAZBOUTEY.LOCAL AD Admin Account   Administrator Step 1: Prerequisites and Package Installation 1.1 Update system and install required packages bash sudo apt update sudo apt install realmd sssd libnss-sss libpam-sss adcli \ samba-common-bin oddjob oddjob-mkhomedir packagekit \ libpam-modules openssh-server Step 2: Test DNS and Kerberos Configuration Ensure that the client can resolve the AD domain and discover services. 2.1 Test domain name resol...

Observability with grafana and prometheus (SSO configutation with active directory)

How to Set Up Grafana Single Sign-On (SSO) with Active Directory (AD) Grafana is a powerful tool for monitoring and visualizing data. Integrating it with Active Directory (AD) for Single Sign-On (SSO) can streamline access and enhance security. This tutorial will guide you through the process of configuring Grafana with AD for SSO. Prerequisites Active Directory Domain : Ensure you have an AD domain set up. Domain: bazboutey.local AD Server IP: 192.168.170.212 Users: grafana (for binding AD) user1 (to demonstrate SSO) we will end up with a pattern like this below Grafana Installed : Install Grafana on your server. Grafana Server IP: 192.168.179.185 Administrator Privileges : Access to modify AD settings and Grafana configurations. Step 1: Configure AD for LDAP Integration Create a Service Account in AD: Open Active Directory Users and Computers. Create a user (e.g., grafana ). Assign this user a strong password (e.g., Grafana 123$ ) and ensure it doesn’t expire. Gather Required AD D...