Design Decisions
The short version of why the lab is built the way it is.
Infrastructure
Section titled “Infrastructure”Proxmox instead of bare-metal Kubernetes
Section titled “Proxmox instead of bare-metal Kubernetes”I run Kubernetes on VMs rather than bare metal so one box can host both the Talos nodes and the TrueNAS VM, and so I have a layer to experiment on. I can spin nodes up and tear them down without touching the bare-metal install underneath.
Talos Linux for the nodes
Section titled “Talos Linux for the nodes”Talos has no SSH, no shell, and no package manager. The root filesystem is immutable and everything goes through an API. That removes a lot of attack surface and forces the entire node config to be declarative, which is exactly what I want from a node OS.
TrueNAS for storage
Section titled “TrueNAS for storage”Bulk data and media sit on a dedicated TrueNAS VM. I run it for ZFS, snapshots, and a web UI I actually like, and because it can serve those shares over NFS and SMB to things outside Kubernetes too. The cluster’s own application state lives on replicated Longhorn volumes instead.
Kubernetes platform
Section titled “Kubernetes platform”Cilium for networking
Section titled “Cilium for networking”Cilium is an eBPF CNI with L3/L4/L7 network policy, a kube-proxy replacement, L2 announcements for LoadBalancer IPs, and Hubble for visibility. The rest of the networking design leans on those features, the L2-announced gateway IP especially.
Gateway API, and the road to Envoy Gateway
Section titled “Gateway API, and the road to Envoy Gateway”Ingress here has been through three generations. It started on ingress-nginx, then moved to the Gateway API fronted by Istio, which I wrote up on my blog. It now runs Envoy Gateway, which I switched to for its built-in OIDC support, and which is also where the Coraza WAF plugs in. The constant across all three is the Gateway API: it is the Kubernetes standard, more expressive than Ingress, and it cleanly separates cluster-level routing from per-app routing. The old Istio setup is kept under kubernetes/cluster/inactive for reference.
Argo CD for GitOps
Section titled “Argo CD for GitOps”I chose Argo CD over Flux mostly for the UI. Seeing sync status and diffs at a glance makes learning and debugging much easier. App-of-Apps handles bootstrapping and ApplicationSets handle templating.
External Secrets Operator over Sealed Secrets
Section titled “External Secrets Operator over Sealed Secrets”External Secrets Operator with a Bitwarden Secrets Manager backend keeps secrets out of Git completely, not even in encrypted form, and gives me a UI to manage them. Sealed Secrets keeps encrypted secrets in Git and needs the kubeseal CLI, which I wanted to avoid.
Automation
Section titled “Automation”Terragrunt over plain OpenTofu
Section titled “Terragrunt over plain OpenTofu”Terragrunt keeps the OpenTofu configuration DRY across stacks and wires up the remote state backend on its own, so I am not repeating backend and provider config in every stack.
Ansible for Proxmox
Section titled “Ansible for Proxmox”Ansible is agentless and idempotent, which suits configuring physical nodes I want to re-run against safely. The core cluster build uses the lae.proxmox role, wrapped by custom roles for the lab-specific pieces.
Taskfile over Make
Section titled “Taskfile over Make”Task uses readable YAML, and task --list-all makes every workflow discoverable. For multi-step flows like cluster bootstrap and teardown it reads far better than Makefile syntax. On top of the CLI, task-ui gives a browser view of the same tasks, which I ship as a second runner image variant for a point-and-click way to run them.