# Guildhall deploy runbook **Target:** `guildhall.guildhouse.dev` on the Hetzner Talos cluster, via Forgejo container registry at `git.guildhouse.dev/tking/guildhall`. **Pattern:** direct `kubectl apply` against the cluster; Flux integration deferred. TLS terminates at Cloudflare (orange cloud); origin is plain HTTP on the Hetzner LB. **Required reference docs:** `DEPLOY-EXPLORATORY-2026-04-21.md` (cluster state), `FORGEJO-REGISTRY-INVESTIGATION-2026-04-21.md` (registry state). Tag referenced throughout this runbook: **`v0.1.0`**. When deploying a subsequent tag, substitute throughout OR use the sed helper at the bottom. --- ## Prerequisites - `kubectl` configured against the Talos cluster (`KUBECONFIG=~/projects/substrate-project/guildhouse-talos-bootstrap/kubeconfig`) - `docker` available on the build host with enough disk for an Elixir build image (~2 GB) - Cloudflare account access for `guildhouse.dev` DNS - Forgejo account `tking` at `git.guildhouse.dev` --- ## Phase 1 — Build and push the image ### 1.1 Create a Forgejo Personal Access Token Navigate to `https://git.guildhouse.dev/-/user/settings/applications`. Generate a new token: - **Token name:** `guildhall-registry-push` (or similar) - **Scopes:** `package:write` (this token will both push and pull; scope down to `package:read` for a separate in-cluster-pull token if splitting) - **Expiry:** operator's choice; 30-90 days is reasonable for the push token Copy the token value immediately (Forgejo won't show it again). Save it in your password manager. ### 1.2 Docker login ```bash docker login git.guildhouse.dev -u tking # paste PAT when prompted ``` Verify with `cat ~/.docker/config.json | jq '.auths | keys'` — `git.guildhouse.dev` should appear. ### 1.3 Build the image ```bash cd /home/tking/projects/substrate-project/guildhall docker build -t git.guildhouse.dev/tking/guildhall:v0.1.0 . ``` Cold build takes ~5-10 minutes (mix deps + erlang compile + tailwind + esbuild + phx.digest + mix release). Subsequent builds hit Docker layer cache and are much faster. Verify the image runs before pushing: ```bash docker run --rm -it --entrypoint /bin/sh \ git.guildhouse.dev/tking/guildhall:v0.1.0 \ -c 'ls -la /app/bin && /app/bin/guildhall version' ``` Expected: the `guildhall` release binary is present and `version` returns the release version without error. ### 1.4 Push to Forgejo registry ```bash docker push git.guildhouse.dev/tking/guildhall:v0.1.0 ``` ### 1.5 Verify image is in the registry Via Forgejo UI: `https://git.guildhouse.dev/tking/-/packages` → should list `guildhall` with a `v0.1.0` tag. Via registry API (authenticated): ```bash curl -sS -u tking: https://git.guildhouse.dev/v2/tking/guildhall/tags/list # → {"name":"tking/guildhall","tags":["v0.1.0"]} ``` ### 1.6 Decide package visibility In the Forgejo UI, for the new `guildhall` container package: - **Private** (default, recommended for tonight): cluster needs `guildhall-registry` pull secret (Phase 2.2 below creates it) - **Public:** anonymous pulls work; skip Phase 2.2 and remove `imagePullSecrets` from `k8s/60-migration-job.yaml` and `k8s/70-guildhall-deployment.yaml` before applying --- ## Phase 2 — Cluster-side preparation ### 2.1 Create the namespace ```bash kubectl apply -f k8s/00-namespace.yaml ``` Verify: `kubectl get ns guildhall` → `Active`. ### 2.2 Create the registry pull secret (if package is private) ```bash kubectl create secret docker-registry guildhall-registry \ --docker-server=git.guildhouse.dev \ --docker-username=tking \ --docker-password='' \ --namespace=guildhall ``` Optionally use a read-only PAT here instead of the push PAT from Phase 1.1. Skip this step entirely if the package is public. ### 2.3 Create the database credentials secret Generate a strong password and save it to your password manager before running: ```bash DB_PASSWORD="$(openssl rand -base64 32 | tr -d '/+=' | head -c 32)" echo "Save this: $DB_PASSWORD" kubectl create secret generic guildhall-db-credentials \ --from-literal=POSTGRES_DB=guildhall \ --from-literal=POSTGRES_USER=guildhall \ --from-literal=POSTGRES_PASSWORD="$DB_PASSWORD" \ --namespace=guildhall ``` ### 2.4 Create the application secrets ```bash SECRET_KEY_BASE="$(cd /home/tking/projects/substrate-project/guildhall && mix phx.gen.secret)" kubectl create secret generic guildhall-app-secrets \ --from-literal=SECRET_KEY_BASE="$SECRET_KEY_BASE" \ --from-literal=DATABASE_URL="ecto://guildhall:$DB_PASSWORD@guildhall-postgres:5432/guildhall" \ --namespace=guildhall ``` Verify secrets exist: ```bash kubectl get secrets -n guildhall # expect: guildhall-registry, guildhall-db-credentials, guildhall-app-secrets ``` --- ## Phase 3 — Database provisioning ### 3.1 Apply Postgres PVC, Deployment, Service ```bash kubectl apply -f k8s/20-postgres-pvc.yaml kubectl apply -f k8s/30-postgres-deployment.yaml kubectl apply -f k8s/40-postgres-service.yaml ``` ### 3.2 Wait for Postgres Ready ```bash kubectl rollout status deployment/guildhall-postgres -n guildhall --timeout=5m kubectl wait --for=condition=Ready pod \ -l app=guildhall-postgres -n guildhall --timeout=3m ``` Verify it accepts connections: ```bash kubectl exec -n guildhall deployment/guildhall-postgres -- \ pg_isready -U guildhall # → /var/run/postgresql:5432 - accepting connections ``` --- ## Phase 4 — Schema migration ### 4.1 Run the migration Job ```bash kubectl apply -f k8s/60-migration-job.yaml ``` ### 4.2 Wait for Job completion ```bash kubectl wait --for=condition=complete job/guildhall-migrate-v0-1-0 \ -n guildhall --timeout=3m ``` ### 4.3 Verify migration output ```bash kubectl logs job/guildhall-migrate-v0-1-0 -n guildhall ``` Look for `Migrations already up` (no-op if Guildhall has no migrations yet) or a list of `== Running 20xx...` / `== Migrated` entries. If the Job fails, inspect events + logs: ```bash kubectl describe job guildhall-migrate-v0-1-0 -n guildhall kubectl logs job/guildhall-migrate-v0-1-0 -n guildhall ``` Common failures and remediation: DATABASE_URL pointing at a wrong host (check `guildhall-app-secrets`); Postgres not yet accepting auth (wait longer); migration SQL error (fix in source, rebuild image, re-push, re-apply Job). --- ## Phase 5 — Application deployment ### 5.1 Apply Guildhall Deployment + Service ```bash kubectl apply -f k8s/70-guildhall-deployment.yaml kubectl apply -f k8s/80-guildhall-service.yaml ``` ### 5.2 Wait for Deployment rollout ```bash kubectl rollout status deployment/guildhall -n guildhall --timeout=5m ``` If this hangs, check pod events + logs: ```bash kubectl get pods -n guildhall kubectl describe pod -n guildhall -l app=guildhall kubectl logs -n guildhall -l app=guildhall --tail=100 ``` ### 5.3 Obtain the LoadBalancer IP Hetzner CCM provisions a new LB; allow 30-90 seconds after the Service is applied. ```bash kubectl get svc guildhall -n guildhall -w # ^C once EXTERNAL-IP transitions from to a public address ``` Record the IPv4 in `EXTERNAL-IP`. IPv6 will also be assigned; note both. --- ## Phase 6 — DNS + end-to-end verification ### 6.1 Create Cloudflare DNS records In the Cloudflare dashboard for `guildhouse.dev` (or via `flarectl` / `terraform` if automated), create: - **A record:** `guildhall` → `` — **proxied (orange cloud)** - **AAAA record** (optional, recommended): `guildhall` → `` — proxied Proxied is load-bearing: it's what provides TLS. Do NOT grey-cloud this record. ### 6.2 Smoke test Allow Cloudflare's edge to pick up the record (1-2 minutes). ```bash # Health endpoint — unauthenticated, should return 200 curl -sS -w '\n-- HTTP %{http_code} --\n' https://guildhall.guildhouse.dev/health # Root — should return 200 with LiveView-rendered HTML curl -sS -w '\n-- HTTP %{http_code} --\n' -I https://guildhall.guildhouse.dev/ ``` Expected: `/health` returns `200` with `{"status":"ok","checks":{"db":"ok"}}`; `/` returns `200` with Phoenix's rendered HTML. ### 6.3 Manual walkthrough In a browser, visit `https://guildhall.guildhouse.dev/`: - Dashboard LiveView should render - `/ceremonies` and `/artifacts` should render (will be empty — no data yet) - No certificate warnings (Cloudflare-terminated TLS) --- ## Iterating on subsequent tags For v0.1.1, v0.1.2, etc.: 1. Build + push the new image 2. Update the `image:` tag in `k8s/60-migration-job.yaml` and `k8s/70-guildhall-deployment.yaml` 3. Update the Job name in `k8s/60-migration-job.yaml` (e.g. `guildhall-migrate-v0-1-1`) 4. `kubectl apply -f k8s/60-migration-job.yaml` — run the new migration Job 5. `kubectl apply -f k8s/70-guildhall-deployment.yaml` — rolling update of Guildhall A sed helper to bump everything at once: ```bash OLD=v0.1.0; NEW=v0.1.1 sed -i "s|guildhall:${OLD}|guildhall:${NEW}|g" \ k8s/60-migration-job.yaml k8s/70-guildhall-deployment.yaml sed -i "s|guildhall-migrate-${OLD//./-}|guildhall-migrate-${NEW//./-}|g" \ k8s/60-migration-job.yaml ``` --- ## Rollback ### Back out the current deployment Rolling back to a prior image tag (assuming the prior tag is still in the registry): ```bash kubectl set image -n guildhall deployment/guildhall \ guildhall=git.guildhouse.dev/tking/guildhall: kubectl rollout status -n guildhall deployment/guildhall ``` Schema rollback (only if the current deploy introduced migrations that need to be reverted): ```bash kubectl run guildhall-rollback --rm -it \ --image=git.guildhouse.dev/tking/guildhall: \ --overrides='{"spec":{"imagePullSecrets":[{"name":"guildhall-registry"}]}}' \ -n guildhall -- \ /app/bin/guildhall eval "Guildhall.OpsDb.Release.rollback(Guildhall.OpsDb.Repo, )" ``` ### Tear down the whole deployment ```bash # Delete in reverse order; namespace deletion cascades everything # attached to it (Deployments, Services, Pods, PVC... note that # deleting the namespace ALSO deletes the PVC, which destroys the # database. For non-destructive teardown, preserve the PVC first.) kubectl delete svc guildhall -n guildhall # triggers Hetzner LB deprovision kubectl delete deployment guildhall -n guildhall kubectl delete job -l app.kubernetes.io/name=guildhall,app.kubernetes.io/component=migration -n guildhall kubectl delete deployment guildhall-postgres -n guildhall kubectl delete svc guildhall-postgres -n guildhall # PVC delete is destructive (Longhorn reclaim policy is Delete). # Uncomment only if the database state should be destroyed: # kubectl delete pvc guildhall-db -n guildhall kubectl delete secret guildhall-registry guildhall-db-credentials guildhall-app-secrets -n guildhall # Finally the namespace itself (retained if you want to keep PVC): # kubectl delete namespace guildhall ``` Remove the Cloudflare DNS record for `guildhall.guildhouse.dev` if fully tearing down. --- ## Known v0.1 limitations - **Cloudflare-edge TLS, not cluster-terminated.** Upgrading to cert-manager Certificate + in-cluster TLS is hygiene follow-up once the first deploy stabilizes. The `letsencrypt-prod` ClusterIssuer is already ready. - **No Flux integration.** Direct `kubectl apply` is the deploy mechanism for v0.1. Flux Kustomization for Guildhall is follow-up — especially once the broader Flux chain (cluster-infra, spire, quartermaster) is healed. - **No OIDC / Keycloak integration.** Guildhall's `config/runtime.exs` has commented-out OIDC env vars; wiring them to the existing `auth.guildhouse.dev` Keycloak is follow-up. - **No substrate CRD integration.** The CeremonyOrchestrator and ChronicleConsumer stubs are not yet watching real substrate CRDs — those integrations land after the substrate foundation is reconciling on this cluster. - **Single replica.** Safe for LiveView (no cluster sticky-session concerns at replicas=1). Scale once DNS cluster / horizontal-pod-autoscaler is configured.