DEPLOY-EXPLORATORY documents the cluster state that shaped deployment decisions (Keycloak as template, Hetzner LB + Cloudflare pattern, no Postgres operator so sibling-Deployment pattern). FORGEJO-REGISTRY-INVESTIGATION documents that the registry was already operational in Forgejo 9.0.3 (packages enabled by default) and the storage/credential path forward. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Tyler J King <tking@guildhouse.dev>
17 KiB
Guildhall deploy exploratory — Talos/Hetzner cluster state
Date: 2026-04-21
Scope: Read-only audit of the Talos/Hetzner Kubernetes cluster to inform Guildhall's initial deployment.
Method: kubectl against the cluster via ~/projects/substrate-project/guildhouse-talos-bootstrap/kubeconfig. No mutations.
Takeaway (synthesis at end): Guildhall fits cleanly into the existing Keycloak/Forgejo deployment pattern: plain Deployment + Deployment-backed Postgres + Longhorn PVC + Hetzner LoadBalancer + Cloudflare-terminated TLS. No new infrastructure components required. The v1 substrate foundation (bascule / quartermaster / spire / chronicle / substrate-operator) is Flux-manifested but broken and not running; governance integration is explicitly follow-up work, not blocking.
1. Cluster basics
| Control plane endpoint | https://178.104.100.159:6443 |
| kubectl client | v1.32.2 |
| kubectl server | v1.32.3 |
| Nodes | 5 (3 control-plane + 2 workers), all Ready |
| OS | Talos v1.9.5 |
| Kernel | 6.12.18-talos |
| Container runtime | containerd 2.0.3 |
| Cluster age | 10 days |
gsh-cp-01 control-plane 10.0.1.10
gsh-cp-02 control-plane 10.0.1.20
gsh-cp-03 control-plane 10.0.1.21
gsh-worker-01 worker 10.0.1.22
gsh-worker-02 worker 10.0.1.30
Matches the memory-carried description (Hetzner Talos cluster 2026-04-11: Talos 1.9.5, 5 nodes, 3 CP + 2 worker). No drift.
2. Namespace inventory
| Namespace | Purpose | Workloads |
|---|---|---|
cert-manager |
cert-manager 3 controllers (cert-manager, cainjector, webhook) | 3 Deployments |
flux-system |
Flux GitOps | 4 Deployments (source / kustomize / helm / notification controllers) |
forgejo |
Forgejo git (self-hosted) | Deployment + Postgres Deployment + Runner Deployment (0/1, stuck) |
keycloak |
Keycloak OIDC IdP | Deployment + Postgres Deployment |
longhorn-system |
Longhorn CSI storage | 5 DaemonSets + 6 Deployments + UI |
kube-system, kube-public, kube-node-lease |
K8s system | — |
default |
Empty | — |
Application workloads: forgejo, keycloak. These are the reference patterns for Guildhall.
3. Ingress / gateway state
No traditional ingress controller, no Gateway API:
kubectl get ingressclasses→ No resources foundkubectl get gatewayclasses→ server doesn't have the resource type (Gateway API CRDs not installed)- No HAProxy / nginx / traefik / istio / kong pods anywhere
Traffic reaches services via type: LoadBalancer directly, backed by the Hetzner Cloud Controller Manager (hcloud-cloud-controller-manager running in kube-system). Each LoadBalancer Service provisions a real Hetzner Cloud Load Balancer via annotations:
annotations:
load-balancer.hetzner.cloud/location: nbg1
load-balancer.hetzner.cloud/name: keycloak-lb-v2
load-balancer.hetzner.cloud/type: lb11
load-balancer.hetzner.cloud/use-private-ip: "false"
Cilium Envoy DaemonSet pods exist on every node (cilium-envoy for L7 proxy features — CiliumNetworkPolicy L7 filtering, not Gateway API). enable-l7-proxy: true is set in the Cilium config.
Existing LoadBalancer services and their public addresses:
| Service | Namespace | IPv4 | IPv6 | Ports |
|---|---|---|---|---|
forgejo-http |
forgejo | 46.225.47.75 |
2a01:4f8:1c1f:65bb::1 |
80 → 30000, 22 → 30022 |
keycloak |
keycloak | 162.55.157.168 |
2a01:4f8:1c1d:1109::1 |
80 → 30080 |
Both expose port 80 only. No in-cluster TLS termination. TLS is terminated upstream at Cloudflare (git.guildhouse.dev and auth.guildhouse.dev resolve to Cloudflare IPs; Cloudflare proxies to the Hetzner LB IPs).
4. cert-manager / TLS
cert-manager is installed and healthy but no Certificate resources exist anywhere on the cluster.
| ClusterIssuer | Status |
|---|---|
letsencrypt-prod |
Ready |
letsencrypt-staging |
Ready |
Both ClusterIssuers are provisioned and ready to issue. They just aren't being used yet — TLS is currently handled via Cloudflare's Universal SSL / Full mode at the edge, with HTTP between Cloudflare and the Hetzner LBs.
Implication for Guildhall: can choose between the existing Cloudflare-termination pattern (simplest, matches forgejo/keycloak) or start using cert-manager (more work, cluster-integrated certs). The former is tonight's path; the latter is a hygiene follow-up.
5. Database patterns
No Postgres operator installed. CRDs checked (all absent):
- CloudNativePG (
clusters.postgresql.cnpg.io) - Zalando postgres-operator (
postgresqls.acid.zalan.do) - Crunchy PGO (
postgresclusters.postgres-operator.crunchydata.com)
Existing pattern is plain Deployment + PVC:
forgejo-postgres Deployment postgres:16 PVC: forgejo-db 10Gi longhorn
keycloak-postgres Deployment postgres:16 PVC: keycloak-db 5Gi longhorn
Storage: Longhorn 1.x, single StorageClass longhorn (default). All PVCs use it. 5 DaemonSet replicas of longhorn-manager confirm storage is healthy across all nodes.
Current PVCs:
| PVC | Namespace | Size | StorageClass |
|---|---|---|---|
| forgejo-data | forgejo | 20Gi | longhorn |
| forgejo-db | forgejo | 10Gi | longhorn |
| runner-cache | forgejo | 5Gi | longhorn |
| keycloak-db | keycloak | 5Gi | longhorn |
6. Secrets management
None of the common secret managers are installed:
- External Secrets Operator: absent
- Sealed Secrets: absent
- SPIRE/SPIFFE: absent (Flux has a Kustomization for it but the
spire-systemnamespace doesn't exist — see §10 Flux state) - Vault: absent
Secrets are plain Opaque Secret resources. Examples:
forgejo/forgejo-secrets(3 keys)keycloak/keycloak-secrets(2 keys)
Managed out-of-band (likely committed to the private Flux source repo or applied via kubectl during bootstrap). No rotation mechanism visible.
7. Existing workload patterns
Reference: keycloak Deployment (cleanest example — the only Flux Kustomization that's Ready):
- Image:
quay.io/keycloak/keycloak:26.0(public registry) - Env composition: mix of literal
value:(DB host, DB port, realm name) andvalueFrom.secretKeyRef(admin password, DB password) - Labels:
app.kubernetes.io/name=keycloak,app.kubernetes.io/part-of=guildhouse - Config files: ConfigMap-mounted realm import (
keycloak-realm-import) - Resources: resource requests/limits not aggressively set (defaults mostly)
- Service:
type: LoadBalancerwith Hetzner annotations, exposes port 80 only - TLS: none in-cluster; Cloudflare upstream
Reference: forgejo-postgres Deployment:
- Image:
postgres:16(public Docker Hub) - Env:
POSTGRES_USER,POSTGRES_DBliteral;POSTGRES_PASSWORDfrom Secret - PGDATA:
/var/lib/postgresql/data/pgdata(standard subdirectory to avoid lost+found issues) - Volume: PVC mounted at
/var/lib/postgresql/data
No existing Elixir/Phoenix deployment to reference. Guildhall will be the first. The pattern will follow the Keycloak/Forgejo shape applied to Phoenix's runtime requirements.
8. Guildhouse-specific components (v1 foundation)
Currently running: none.
- No pod matching
substrate,bascule,chronicle,quartermaster, orspireacross all namespaces. The v1 substrate foundation is absent from the cluster's running state. - Flux has Kustomizations for
bascule,quartermaster,spire,automation,governance-talos,gitops-controller— all failing on a dependency chain:
spire → fails: namespace "spire-system" does not exist
quartermaster → fails: dependency flux-system/spire is not ready
bascule → fails: dependency flux-system/quartermaster is not ready
automation → fails: dependency flux-system/quartermaster is not ready
gitops-controller → fails: dependency flux-system/quartermaster is not ready
governance-talos → fails: dependency flux-system/cluster-infra is not ready
cluster-infra → SUSPENDED + YAML decode error on 10-cilium-values.yaml
This chain needs to be unblocked for the v1 substrate foundation to reach the cluster, but this is explicitly NOT Guildhall's blocker. Guildhall is the standalone orchestration/presentation layer; it composes with substrate via CRD watches once substrate is running, but doesn't require substrate present to stand up and serve its web UI.
9. Networking specifics
Cilium version: v1.16.5 (Cilium 1.16 series, recent but not 1.17-cutting-edge)
Key Cilium config (from kube-system/cilium-config):
| Flag | Value | Notes |
|---|---|---|
kube-proxy-replacement |
true |
Cilium replaces kube-proxy (full eBPF mode) |
enable-ipv4 |
true |
IPv4 on pod network |
enable-ipv6 |
false |
IPv6 NOT enabled at pod network (LBs get Hetzner-assigned v6 externally) |
enable-l7-proxy |
true |
Envoy DaemonSet for L7 filtering |
enable-hubble |
true |
Hubble observability |
ipam |
kubernetes |
Host-IPAM, not cluster-pool |
Not enabled / not present:
- BGP control plane (
ciliumbgppeeringpoliciesCRD absent) - L2 announcements (
ciliuml2announcementpoliciesCRD present but zero resources) - LoadBalancerIPPool (CRD present but zero resources — Hetzner CCM handles LB IPs instead)
- Gateway API (
gatewayclassesCRD absent) - ClusterMesh (single-cluster)
NetworkPolicies in place (only 3, all in flux-system):
allow-egressallow-scrapingallow-webhooks(scoped toapp=notification-controller)
CiliumNetworkPolicies: none. Workloads rely on default-allow between pods. Guildhall deployment can proceed without adding policies; adding them is hardening follow-up.
10. Deployment automation
GitOps: Flux is the sole mechanism. Running components:
source-controller,kustomize-controller,helm-controller,notification-controller— all 1/1 Ready
Sources: one GitRepository registered:
flux-system / guildhouse-deploy
URL: https://github.com/gh-tking/guildhouse-deploy-talos-mirror
STATUS: Ready (artifact stored for main@169e077f)
Kustomizations: 9 total, summary:
| Name | Status |
|---|---|
keycloak |
✅ Ready (applied revision 169e077f) |
forgejo |
❌ health check failed (forgejo-runner Deployment stuck InProgress) |
cluster-infra |
❌ SUSPENDED + YAML decode error |
spire |
❌ spire-system namespace not found |
quartermaster |
❌ depends on spire (not ready) |
bascule |
❌ depends on quartermaster (not ready) |
automation |
❌ depends on quartermaster (not ready) |
gitops-controller |
❌ depends on quartermaster (not ready) |
governance-talos |
❌ depends on cluster-infra (not ready) |
Key observation: only keycloak flows through Flux successfully. Everything else is either suspended, blocked on missing upstream dependencies, or has a YAML error in the source repo.
HelmRepositories and HelmReleases: none.
Changes land on the cluster: currently via Flux against the GitHub-hosted source repo for the one working Kustomization (keycloak), otherwise via direct kubectl apply (given the broken Flux chain).
Synthesis
What Guildhall can leverage
- Longhorn StorageClass — works out of the box for Postgres PVC. 5Gi is ample for initial Guildhall DB (matches keycloak-db sizing).
- Hetzner CCM LoadBalancer — a LoadBalancer Service with
load-balancer.hetzner.cloud/*annotations provisions a new Hetzner LB automatically. Cost is ~€5/mo for anlb11tier. Matches forgejo / keycloak exactly. - Cloudflare-at-the-edge TLS — DNS at
guildhall.guildhouse.devpoints at the Hetzner LB IPv4, Cloudflare terminates TLS, origin is plain HTTP on port 80. Zero cert-manager work required for v1. - Keycloak as OIDC IdP — already running at
auth.guildhouse.dev. When Guildhall wires its OIDC config (currently commented out inconfig/runtime.exs), the endpoint is ready. Not blocking tonight. - cert-manager ClusterIssuers —
letsencrypt-prodandletsencrypt-stagingare ready, available as upgrade-path from Cloudflare-edge TLS to cluster-terminated TLS if/when that hygiene pass happens. - Reference deployment pattern — keycloak's Deployment shape (public image, env-from-secret, ConfigMap for data, Service type=LoadBalancer, Postgres sibling Deployment + PVC) maps directly to Guildhall. Apply the same template.
- Flux GitOps pipeline exists (if desired) — a new Kustomization in
guildhouse-deploy-talos-mirrorfor Guildhall would auto-deploy. BUT the Flux state is currently messy — most Kustomizations are broken — so a directkubectl applypath is cleaner for the v1 Guildhall deploy, with a follow-up Flux migration once the broader chain is healed.
What Guildhall needs that the cluster doesn't have yet
- Guildhall container image. Must be built locally via
mix release+ Dockerfile and pushed to a registry the cluster can pull from. Registry options:ghcr.io/gh-tking/guildhall:<tag>— public GitHub Container Registry (requires packaging via the GitHub Actions or manual docker push)- Docker Hub under a personal account
- Forgejo container registry at
git.guildhouse.dev/tking/guildhall:<tag>— Forgejo 1.19+ supports OCI registry; this is the most consistent choice with the rest of the Guildhouse tooling - A private Hetzner-region ghcr mirror
- Secrets:
guildhall-secretsOpaque Secret with at minimumSECRET_KEY_BASE(64-byte Phoenix session key,mix phx.gen.secret) andDATABASE_URL(or discreteDB_PASSWORD+ construct URL at runtime). - Namespace:
guildhall(new). - DNS record:
guildhall.guildhouse.dev→ Hetzner LB IPv4 (via Cloudflare). Can be created after LB is provisioned, once the LB IP is known.
Likely shape of the deployment
Based on the keycloak/forgejo pattern:
Namespace: guildhall
├── Deployment: guildhall-postgres (postgres:16, env POSTGRES_* from guildhall-secrets)
├── PVC: guildhall-db (longhorn, 5-10Gi)
├── Service: guildhall-postgres (ClusterIP, 5432)
├── Secret: guildhall-secrets (SECRET_KEY_BASE, DB_PASSWORD)
├── Deployment: guildhall (image from ghcr / forgejo registry / etc, envs DATABASE_URL + SECRET_KEY_BASE + PHX_HOST=guildhall.guildhouse.dev + PHX_SERVER=true + PORT=4000)
└── Service: guildhall (type=LoadBalancer, Hetzner annotations, port 80 → 4000)
Release build discipline:
mix releasein Docker multi-stage build (Elixir 1.17.3 / OTP 27 builder stage, debian-slim runtime stage)mix ecto.migrateon container start (or a Job, or mix release custom step)PHX_SERVER=trueto start the HTTP server (perconfig/runtime.exs)- Health check endpoint (Phoenix default or custom
/health)
Surprises
What's present that wasn't expected:
- Keycloak is already serving at
auth.guildhouse.dev. The OIDC substrate Guildhall will eventually integrate with is live. Zero setup needed for that dependency when the time comes. - cert-manager is installed but unused. Suggests a deliberate deferral in favor of Cloudflare-edge TLS; the ClusterIssuers are staged and ready for when in-cluster TLS is adopted.
- Cilium Envoy DaemonSet is running on every node but with no Gateway API / CiliumEnvoyConfig / L7 policies currently in play. Present for future L7 use, not actively load-bearing yet.
What's expected but absent:
- No HAProxy. Previous K3s-era cluster used HAProxy as ingress; this cluster doesn't. Hetzner LBs took its role.
- v1 substrate foundation is entirely absent from the running cluster. bascule, substrate-operator, chronicle, quartermaster, SPIRE — none running. Flux manifests exist (in the
guildhouse-deploy-talos-mirrorrepo) but are blocked on a dependency chain rooted at missingspire-systemnamespace and a YAML decode error incluster-infra/10-cilium-values.yaml. Unblocking this is real work that is NOT on the Red Hat path — governance integration is follow-up. - No existing Elixir/Phoenix deployment to copy. Guildhall will be the first Phoenix app on this cluster.
- Flux source is on GitHub (
guildhouse-deploy-talos-mirror), not Forgejo. Follows the same pattern as the substrate-project umbrella migration just completed — another GitHub→Forgejo item on the cleanup list, not blocking.
Minimum path to Guildhall running at guildhall.guildhouse.dev
- Dockerfile in
~/projects/substrate-project/guildhall/— multi-stage with OTP 27,mix release - Build and push image to a registry (Forgejo container registry at
git.guildhouse.dev/tking/guildhall:v0.1.0recommended for consistency) - Generate
SECRET_KEY_BASEviamix phx.gen.secret - Create
guildhallnamespace; createguildhall-secretsSecret - Apply Deployment + Service + Postgres + PVC manifest (template from keycloak)
- Wait for Hetzner LB to provision; note IPv4
- Create Cloudflare DNS record
guildhall.guildhouse.dev→ LB IPv4 (proxied, so Cloudflare handles TLS) - Verify; run any first-time ecto migration
No cluster infrastructure changes. No cert-manager Certificates. No Flux reconfiguration. No governance-stack dependency. Just the same Deployment-shaped pattern that Keycloak and Forgejo already use, applied to Guildhall.
Governance integration (CRD watchers, SPIFFE identity, Chronicle wiring, Accord enforcement) is explicitly follow-up work for after Guildhall is reachable and the Red Hat submission is in.