Skip to content

Design Decisions

The short version of why the lab is built the way it is.

I run Kubernetes on VMs rather than bare metal so one box can host both the Talos nodes and the TrueNAS VM, and so I have a layer to experiment on. I can spin nodes up and tear them down without touching the bare-metal install underneath.

Talos has no SSH, no shell, and no package manager. The root filesystem is immutable and everything goes through an API. That removes a lot of attack surface and forces the entire node config to be declarative, which is exactly what I want from a node OS.

Bulk data and media sit on a dedicated TrueNAS VM. I run it for ZFS, snapshots, and a web UI I actually like, and because it can serve those shares over NFS and SMB to things outside Kubernetes too. The cluster’s own application state lives on replicated Longhorn volumes instead.

Cilium is an eBPF CNI with L3/L4/L7 network policy, a kube-proxy replacement, L2 announcements for LoadBalancer IPs, and Hubble for visibility. The rest of the networking design leans on those features, the L2-announced gateway IP especially.

Gateway API, and the road to Envoy Gateway

Section titled “Gateway API, and the road to Envoy Gateway”

Ingress here has been through three generations. It started on ingress-nginx, then moved to the Gateway API fronted by Istio, which I wrote up on my blog. It now runs Envoy Gateway, which I switched to for its built-in OIDC support, and which is also where the Coraza WAF plugs in. The constant across all three is the Gateway API: it is the Kubernetes standard, more expressive than Ingress, and it cleanly separates cluster-level routing from per-app routing. The old Istio setup is kept under kubernetes/cluster/inactive for reference.

I chose Argo CD over Flux mostly for the UI. Seeing sync status and diffs at a glance makes learning and debugging much easier. App-of-Apps handles bootstrapping and ApplicationSets handle templating.

External Secrets Operator over Sealed Secrets

Section titled “External Secrets Operator over Sealed Secrets”

External Secrets Operator with a Bitwarden Secrets Manager backend keeps secrets out of Git completely, not even in encrypted form, and gives me a UI to manage them. Sealed Secrets keeps encrypted secrets in Git and needs the kubeseal CLI, which I wanted to avoid.

Terragrunt keeps the OpenTofu configuration DRY across stacks and wires up the remote state backend on its own, so I am not repeating backend and provider config in every stack.

Ansible is agentless and idempotent, which suits configuring physical nodes I want to re-run against safely. The core cluster build uses the lae.proxmox role, wrapped by custom roles for the lab-specific pieces.

Task uses readable YAML, and task --list-all makes every workflow discoverable. For multi-step flows like cluster bootstrap and teardown it reads far better than Makefile syntax. On top of the CLI, task-ui gives a browser view of the same tasks, which I ship as a second runner image variant for a point-and-click way to run them.