feat(deploy): Dockerfile + k8s manifests for Talos deployment

Multi-stage Elixir/OTP Dockerfile, Kubernetes manifests following Keycloak pattern, mix release migration module, and deploy runbook. Target: guildhall.guildhouse.dev via Hetzner LB + Cloudflare (orange cloud). Forgejo container registry at git.guildhouse.dev/tking/guildhall. Not yet deployed; artifacts only. See DEPLOY-RUNBOOK.md for execution. Artifacts produced: - Dockerfile — multi-stage, Elixir 1.17.3 / OTP 27.1.2, debian-bookworm builder + debian-bookworm-slim runtime. Dep-layer caching via explicit apps/*/mix.exs copy before source. Asset pipeline runs mix assets.setup + mix assets.deploy (tailwind + esbuild + phx.digest). Non-root uid 1000, tini as pid-1, HEALTHCHECK against /health. - .dockerignore — excludes _build/, deps/, k8s/, .git/, test artifacts, and apps/guildhall_web/priv/static/assets/ (regenerated by phx.digest inside the builder). - apps/guildhall_web/.../router.ex — adds `/health` route under :api pipeline. Unauthenticated by design (Kubernetes probes + LB target). - apps/guildhall_web/.../controllers/health_controller.ex — shallow health: Phoenix up + Ecto pool can `SELECT 1`. Returns 200 ok or 503 degraded with reason. - apps/guildhall_ops_db/lib/guildhall/ops_db/release.ex — Release module for migrations. `Guildhall.OpsDb.Release.migrate/0` and `rollback/2`. Called from the migration Job via `bin/guildhall eval`. Module path reflects actual repo location (repo is `Guildhall.OpsDb.Repo` in `:guildhall_ops_db`, not the prompt's suggested `Guildhall.Repo`). Kubernetes manifests in k8s/ (numbered for apply order): 00-namespace.yaml — guildhall namespace w/ guildhouse labels 10-registry-secret-template.yaml — doc-only template for dockerconfigjson 20-postgres-pvc.yaml — 5Gi longhorn RWO 30-postgres-deployment.yaml — postgres:16, keycloak-matched resources + pg_isready probes, PGDATA subpath 40-postgres-service.yaml — ClusterIP :5432 50-guildhall-secrets-template.yaml — doc-only template for app + DB secrets 60-migration-job.yaml — ecto migration Job, name includes tag for per-deploy uniqueness, TTL 24h 70-guildhall-deployment.yaml — RollingUpdate maxSurge 1 maxUnavailable 0, /health probes, 200m/256Mi requests and 1/1Gi limits, 5s preStop sleep 80-guildhall-service.yaml — LoadBalancer with exact Keycloak- matched Hetzner annotations (location nbg1, type lb11, name guildhall, use-private-ip false), port 80 origin (Cloudflare TLS) - DEPLOY-RUNBOOK.md — 6-phase deploy sequence (build + push, cluster prep, DB, migrate, app rollout, DNS + smoke), iteration helper with sed-based tag-bump, rollback procedure (image rollback, schema rollback via Release.rollback, full teardown), and v0.1 limitations (Cloudflare-edge TLS not cluster-terminated; no Flux integration; no OIDC wiring; no substrate CRD integration; single replica). Decisions made during artifact production that weren't explicit in the prompt: - Release module name is `Guildhall.OpsDb.Release` (not `Guildhall.Release`) matching the actual repo namespace. Migration Job command adjusted to `Guildhall.OpsDb.Release.migrate()`. - Dockerfile uses `-slim` builder variant (not the full bookworm builder) to keep the builder stage closer to the runtime image size, reducing multi-stage layer transfer during build. - Asset compilation runs `mix assets.setup` before `mix assets.deploy` so tailwind + esbuild binaries install cleanly inside the container (the dev-only :runtime flag on those deps means they need explicit install in a prod builder). - tini added as pid-1 in the runtime stage. Not in the prompt, but standard-practice for OTP containers to ensure signal propagation and zombie reaping under Kubernetes. - Rolling update strategy: maxSurge 1 / maxUnavailable 0 (zero- downtime rollout at replicas=1; the new pod comes up alongside the old, health-checks, then the old is terminated). Matches typical single-replica LiveView pattern. - preStop `sleep 5` — gives in-flight HTTP + LiveView connections a grace window before termination. - Hetzner LB annotations: verified exact set from cluster keycloak service — location=nbg1, name=guildhall, type=lb11, use-private-ip=false. The prompt asked about uses-proxyprotocol and algorithm-type; neither is set on Keycloak's service and both are omitted here for consistency. - Migration Job name includes the tag (`guildhall-migrate-v0-1-0`) so multiple deploys don't collide on Job name reuse. Runbook documents the sed helper to bump both the image tag and the Job name for subsequent deploys. - Both exploratory docs (`DEPLOY-EXPLORATORY-2026-04-21.md`, `FORGEJO-REGISTRY-INVESTIGATION-2026-04-21.md`) are currently untracked in the repo. They're left out of this commit per the prompt's explicit `git add` list. They can be committed separately (or ignored) at Tyler's discretion. Not done tonight (per prompt's NOT PERMITTED list): - docker build / docker push - kubectl apply of any manifest - Forgejo PAT creation - Cloudflare DNS changes - git push (this commit is local-only pending review) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Tyler J King <tking@guildhouse.dev>
2026-04-22 04:00:40 -04:00 · 2026-04-22 04:00:40 -04:00 · c6f1d07ed9
commit c6f1d07ed9
parent 0a6dd03e91
15 changed files with 1028 additions and 0 deletions
--- a/.dockerignore
+++ b/.dockerignore
@ -0,0 +1,53 @@
+# Build artifacts
+_build/
+deps/
+apps/*/_build/
+apps/*/deps/
+
+# Editor / dev state
+.elixir_ls/
+.idea/
+.vscode/
+*.swp
+*.swo
+.DS_Store
+
+# VCS
+.git/
+.gitignore
+.gitattributes
+
+# CI / deploy manifests (not needed in image)
+k8s/
+.forgejo/
+.github/
+
+# Generated web assets — phx.digest regenerates these in the builder
+# stage. The source assets in apps/guildhall_web/assets/ are what the
+# builder needs.
+apps/guildhall_web/priv/static/assets/
+apps/guildhall_web/priv/static/cache_manifest.json
+
+# Test + cover
+apps/*/test/
+cover/
+.coverdata
+
+# Docs + tmp + logs
+doc/
+tmp/
+*.log
+
+# Env (never ship into images)
+.env
+.env.*
+!.env.example
+
+# Release artifacts if any were built locally
+rel/
+releases/
+
+# Docs + design files at repo root
+*.md
+!README.md
+AGENTS.md
--- a/DEPLOY-RUNBOOK.md
+++ b/DEPLOY-RUNBOOK.md
@ -0,0 +1,354 @@
+# Guildhall deploy runbook
+
+**Target:** `guildhall.guildhouse.dev` on the Hetzner Talos cluster, via Forgejo container registry at `git.guildhouse.dev/tking/guildhall`.
+**Pattern:** direct `kubectl apply` against the cluster; Flux integration deferred. TLS terminates at Cloudflare (orange cloud); origin is plain HTTP on the Hetzner LB.
+**Required reference docs:** `DEPLOY-EXPLORATORY-2026-04-21.md` (cluster state), `FORGEJO-REGISTRY-INVESTIGATION-2026-04-21.md` (registry state).
+
+Tag referenced throughout this runbook: **`v0.1.0`**. When deploying a subsequent tag, substitute throughout OR use the sed helper at the bottom.
+
+---
+
+## Prerequisites
+
+- `kubectl` configured against the Talos cluster (`KUBECONFIG=~/projects/substrate-project/guildhouse-talos-bootstrap/kubeconfig`)
+- `docker` available on the build host with enough disk for an Elixir build image (~2 GB)
+- Cloudflare account access for `guildhouse.dev` DNS
+- Forgejo account `tking` at `git.guildhouse.dev`
+
+---
+
+## Phase 1 — Build and push the image
+
+### 1.1 Create a Forgejo Personal Access Token
+
+Navigate to `https://git.guildhouse.dev/-/user/settings/applications`. Generate a new token:
+
+- **Token name:** `guildhall-registry-push` (or similar)
+- **Scopes:** `package:write` (this token will both push and pull; scope down to `package:read` for a separate in-cluster-pull token if splitting)
+- **Expiry:** operator's choice; 30-90 days is reasonable for the push token
+
+Copy the token value immediately (Forgejo won't show it again). Save it in your password manager.
+
+### 1.2 Docker login
+
+```bash
+docker login git.guildhouse.dev -u tking
+# paste PAT when prompted
+```
+
+Verify with `cat ~/.docker/config.json | jq '.auths | keys'` — `git.guildhouse.dev` should appear.
+
+### 1.3 Build the image
+
+```bash
+cd /home/tking/projects/substrate-project/guildhall
+docker build -t git.guildhouse.dev/tking/guildhall:v0.1.0 .
+```
+
+Cold build takes ~5-10 minutes (mix deps + erlang compile + tailwind + esbuild + phx.digest + mix release). Subsequent builds hit Docker layer cache and are much faster.
+
+Verify the image runs before pushing:
+
+```bash
+docker run --rm -it --entrypoint /bin/sh \
+  git.guildhouse.dev/tking/guildhall:v0.1.0 \
+  -c 'ls -la /app/bin && /app/bin/guildhall version'
+```
+
+Expected: the `guildhall` release binary is present and `version` returns the release version without error.
+
+### 1.4 Push to Forgejo registry
+
+```bash
+docker push git.guildhouse.dev/tking/guildhall:v0.1.0
+```
+
+### 1.5 Verify image is in the registry
+
+Via Forgejo UI: `https://git.guildhouse.dev/tking/-/packages` → should list `guildhall` with a `v0.1.0` tag.
+
+Via registry API (authenticated):
+
+```bash
+curl -sS -u tking:<PAT> https://git.guildhouse.dev/v2/tking/guildhall/tags/list
+# → {"name":"tking/guildhall","tags":["v0.1.0"]}
+```
+
+### 1.6 Decide package visibility
+
+In the Forgejo UI, for the new `guildhall` container package:
+
+- **Private** (default, recommended for tonight): cluster needs `guildhall-registry` pull secret (Phase 2.2 below creates it)
+- **Public:** anonymous pulls work; skip Phase 2.2 and remove `imagePullSecrets` from `k8s/60-migration-job.yaml` and `k8s/70-guildhall-deployment.yaml` before applying
+
+---
+
+## Phase 2 — Cluster-side preparation
+
+### 2.1 Create the namespace
+
+```bash
+kubectl apply -f k8s/00-namespace.yaml
+```
+
+Verify: `kubectl get ns guildhall` → `Active`.
+
+### 2.2 Create the registry pull secret (if package is private)
+
+```bash
+kubectl create secret docker-registry guildhall-registry \
+  --docker-server=git.guildhouse.dev \
+  --docker-username=tking \
+  --docker-password='<PAT-with-package:read>' \
+  --namespace=guildhall
+```
+
+Optionally use a read-only PAT here instead of the push PAT from Phase 1.1. Skip this step entirely if the package is public.
+
+### 2.3 Create the database credentials secret
+
+Generate a strong password and save it to your password manager before running:
+
+```bash
+DB_PASSWORD="$(openssl rand -base64 32 | tr -d '/+=' | head -c 32)"
+echo "Save this: $DB_PASSWORD"
+
+kubectl create secret generic guildhall-db-credentials \
+  --from-literal=POSTGRES_DB=guildhall \
+  --from-literal=POSTGRES_USER=guildhall \
+  --from-literal=POSTGRES_PASSWORD="$DB_PASSWORD" \
+  --namespace=guildhall
+```
+
+### 2.4 Create the application secrets
+
+```bash
+SECRET_KEY_BASE="$(cd /home/tking/projects/substrate-project/guildhall && mix phx.gen.secret)"
+
+kubectl create secret generic guildhall-app-secrets \
+  --from-literal=SECRET_KEY_BASE="$SECRET_KEY_BASE" \
+  --from-literal=DATABASE_URL="ecto://guildhall:$DB_PASSWORD@guildhall-postgres:5432/guildhall" \
+  --namespace=guildhall
+```
+
+Verify secrets exist:
+
+```bash
+kubectl get secrets -n guildhall
+# expect: guildhall-registry, guildhall-db-credentials, guildhall-app-secrets
+```
+
+---
+
+## Phase 3 — Database provisioning
+
+### 3.1 Apply Postgres PVC, Deployment, Service
+
+```bash
+kubectl apply -f k8s/20-postgres-pvc.yaml
+kubectl apply -f k8s/30-postgres-deployment.yaml
+kubectl apply -f k8s/40-postgres-service.yaml
+```
+
+### 3.2 Wait for Postgres Ready
+
+```bash
+kubectl rollout status deployment/guildhall-postgres -n guildhall --timeout=5m
+kubectl wait --for=condition=Ready pod \
+  -l app=guildhall-postgres -n guildhall --timeout=3m
+```
+
+Verify it accepts connections:
+
+```bash
+kubectl exec -n guildhall deployment/guildhall-postgres -- \
+  pg_isready -U guildhall
+# → /var/run/postgresql:5432 - accepting connections
+```
+
+---
+
+## Phase 4 — Schema migration
+
+### 4.1 Run the migration Job
+
+```bash
+kubectl apply -f k8s/60-migration-job.yaml
+```
+
+### 4.2 Wait for Job completion
+
+```bash
+kubectl wait --for=condition=complete job/guildhall-migrate-v0-1-0 \
+  -n guildhall --timeout=3m
+```
+
+### 4.3 Verify migration output
+
+```bash
+kubectl logs job/guildhall-migrate-v0-1-0 -n guildhall
+```
+
+Look for `Migrations already up` (no-op if Guildhall has no migrations yet) or a list of `== Running 20xx...` / `== Migrated` entries.
+
+If the Job fails, inspect events + logs:
+
+```bash
+kubectl describe job guildhall-migrate-v0-1-0 -n guildhall
+kubectl logs job/guildhall-migrate-v0-1-0 -n guildhall
+```
+
+Common failures and remediation: DATABASE_URL pointing at a wrong host (check `guildhall-app-secrets`); Postgres not yet accepting auth (wait longer); migration SQL error (fix in source, rebuild image, re-push, re-apply Job).
+
+---
+
+## Phase 5 — Application deployment
+
+### 5.1 Apply Guildhall Deployment + Service
+
+```bash
+kubectl apply -f k8s/70-guildhall-deployment.yaml
+kubectl apply -f k8s/80-guildhall-service.yaml
+```
+
+### 5.2 Wait for Deployment rollout
+
+```bash
+kubectl rollout status deployment/guildhall -n guildhall --timeout=5m
+```
+
+If this hangs, check pod events + logs:
+
+```bash
+kubectl get pods -n guildhall
+kubectl describe pod -n guildhall -l app=guildhall
+kubectl logs -n guildhall -l app=guildhall --tail=100
+```
+
+### 5.3 Obtain the LoadBalancer IP
+
+Hetzner CCM provisions a new LB; allow 30-90 seconds after the Service is applied.
+
+```bash
+kubectl get svc guildhall -n guildhall -w
+# ^C once EXTERNAL-IP transitions from <pending> to a public address
+```
+
+Record the IPv4 in `EXTERNAL-IP`. IPv6 will also be assigned; note both.
+
+---
+
+## Phase 6 — DNS + end-to-end verification
+
+### 6.1 Create Cloudflare DNS records
+
+In the Cloudflare dashboard for `guildhouse.dev` (or via `flarectl` / `terraform` if automated), create:
+
+- **A record:** `guildhall` → `<Hetzner-LB-IPv4>` — **proxied (orange cloud)**
+- **AAAA record** (optional, recommended): `guildhall` → `<Hetzner-LB-IPv6>` — proxied
+
+Proxied is load-bearing: it's what provides TLS. Do NOT grey-cloud this record.
+
+### 6.2 Smoke test
+
+Allow Cloudflare's edge to pick up the record (1-2 minutes).
+
+```bash
+# Health endpoint — unauthenticated, should return 200
+curl -sS -w '\n-- HTTP %{http_code} --\n' https://guildhall.guildhouse.dev/health
+
+# Root — should return 200 with LiveView-rendered HTML
+curl -sS -w '\n-- HTTP %{http_code} --\n' -I https://guildhall.guildhouse.dev/
+```
+
+Expected: `/health` returns `200` with `{"status":"ok","checks":{"db":"ok"}}`; `/` returns `200` with Phoenix's rendered HTML.
+
+### 6.3 Manual walkthrough
+
+In a browser, visit `https://guildhall.guildhouse.dev/`:
+
+- Dashboard LiveView should render
+- `/ceremonies` and `/artifacts` should render (will be empty — no data yet)
+- No certificate warnings (Cloudflare-terminated TLS)
+
+---
+
+## Iterating on subsequent tags
+
+For v0.1.1, v0.1.2, etc.:
+
+1. Build + push the new image
+2. Update the `image:` tag in `k8s/60-migration-job.yaml` and `k8s/70-guildhall-deployment.yaml`
+3. Update the Job name in `k8s/60-migration-job.yaml` (e.g. `guildhall-migrate-v0-1-1`)
+4. `kubectl apply -f k8s/60-migration-job.yaml` — run the new migration Job
+5. `kubectl apply -f k8s/70-guildhall-deployment.yaml` — rolling update of Guildhall
+
+A sed helper to bump everything at once:
+
+```bash
+OLD=v0.1.0; NEW=v0.1.1
+sed -i "s|guildhall:${OLD}|guildhall:${NEW}|g" \
+    k8s/60-migration-job.yaml k8s/70-guildhall-deployment.yaml
+sed -i "s|guildhall-migrate-${OLD//./-}|guildhall-migrate-${NEW//./-}|g" \
+    k8s/60-migration-job.yaml
+```
+
+---
+
+## Rollback
+
+### Back out the current deployment
+
+Rolling back to a prior image tag (assuming the prior tag is still in the registry):
+
+```bash
+kubectl set image -n guildhall deployment/guildhall \
+  guildhall=git.guildhouse.dev/tking/guildhall:<prior-tag>
+kubectl rollout status -n guildhall deployment/guildhall
+```
+
+Schema rollback (only if the current deploy introduced migrations that need to be reverted):
+
+```bash
+kubectl run guildhall-rollback --rm -it \
+  --image=git.guildhouse.dev/tking/guildhall:<current-tag> \
+  --overrides='{"spec":{"imagePullSecrets":[{"name":"guildhall-registry"}]}}' \
+  -n guildhall -- \
+  /app/bin/guildhall eval "Guildhall.OpsDb.Release.rollback(Guildhall.OpsDb.Repo, <migration_version>)"
+```
+
+### Tear down the whole deployment
+
+```bash
+# Delete in reverse order; namespace deletion cascades everything
+# attached to it (Deployments, Services, Pods, PVC... note that
+# deleting the namespace ALSO deletes the PVC, which destroys the
+# database. For non-destructive teardown, preserve the PVC first.)
+
+kubectl delete svc guildhall -n guildhall                # triggers Hetzner LB deprovision
+kubectl delete deployment guildhall -n guildhall
+kubectl delete job -l app.kubernetes.io/name=guildhall,app.kubernetes.io/component=migration -n guildhall
+kubectl delete deployment guildhall-postgres -n guildhall
+kubectl delete svc guildhall-postgres -n guildhall
+
+# PVC delete is destructive (Longhorn reclaim policy is Delete).
+# Uncomment only if the database state should be destroyed:
+# kubectl delete pvc guildhall-db -n guildhall
+
+kubectl delete secret guildhall-registry guildhall-db-credentials guildhall-app-secrets -n guildhall
+
+# Finally the namespace itself (retained if you want to keep PVC):
+# kubectl delete namespace guildhall
+```
+
+Remove the Cloudflare DNS record for `guildhall.guildhouse.dev` if fully tearing down.
+
+---
+
+## Known v0.1 limitations
+
+- **Cloudflare-edge TLS, not cluster-terminated.** Upgrading to cert-manager Certificate + in-cluster TLS is hygiene follow-up once the first deploy stabilizes. The `letsencrypt-prod` ClusterIssuer is already ready.
+- **No Flux integration.** Direct `kubectl apply` is the deploy mechanism for v0.1. Flux Kustomization for Guildhall is follow-up — especially once the broader Flux chain (cluster-infra, spire, quartermaster) is healed.
+- **No OIDC / Keycloak integration.** Guildhall's `config/runtime.exs` has commented-out OIDC env vars; wiring them to the existing `auth.guildhouse.dev` Keycloak is follow-up.
+- **No substrate CRD integration.** The CeremonyOrchestrator and ChronicleConsumer stubs are not yet watching real substrate CRDs — those integrations land after the substrate foundation is reconciling on this cluster.
+- **Single replica.** Safe for LiveView (no cluster sticky-session concerns at replicas=1). Scale once DNS cluster / horizontal-pod-autoscaler is configured.
--- a/110
+++ b/110
@ -0,0 +1,110 @@
+# syntax=docker/dockerfile:1.7
+#
+# Guildhall production image — Elixir/Phoenix umbrella release.
+# Multi-stage: builder produces a mix release; runtime is a slim debian
+# carrying only the OTP release + runtime libs.
+#
+# Build context: the guildhall umbrella root.
+# Target registry: git.guildhouse.dev/tking/guildhall:<tag>
+
+# ---------- Stage 1: builder ---------------------------------------------
+FROM hexpm/elixir:1.17.3-erlang-27.1.2-debian-bookworm-20241202-slim AS builder
+
+ENV MIX_ENV=prod \
+    LANG=C.UTF-8 \
+    LC_ALL=C.UTF-8
+
+RUN apt-get update -qq && \
+    apt-get install -y --no-install-recommends \
+        build-essential \
+        git \
+        ca-certificates \
+        curl \
+    && rm -rf /var/lib/apt/lists/*
+
+RUN mix local.hex --force && \
+    mix local.rebar --force
+
+WORKDIR /app
+
+# Dep resolution needs every apps/*/mix.exs in an umbrella. Copy them
+# before any source so dep-layer cache survives source-only edits.
+COPY mix.exs mix.lock ./
+COPY config/config.exs config/prod.exs config/runtime.exs config/
+COPY apps/guildhall_chronicle/mix.exs apps/guildhall_chronicle/
+COPY apps/guildhall_graph_bridge/mix.exs apps/guildhall_graph_bridge/
+COPY apps/guildhall_ops_db/mix.exs apps/guildhall_ops_db/
+COPY apps/guildhall_orchestrator/mix.exs apps/guildhall_orchestrator/
+COPY apps/guildhall_web/mix.exs apps/guildhall_web/
+
+RUN mix deps.get --only prod && \
+    mix deps.compile
+
+# Source — copied after dep layers so app-source changes don't bust
+# the dep cache.
+COPY apps/ apps/
+
+# Asset pipeline for guildhall_web — tailwind + esbuild + phx.digest.
+# The aliases in apps/guildhall_web/mix.exs define `assets.deploy` as
+# `tailwind guildhall_web --minify` + `esbuild guildhall_web --minify`
+# + `phx.digest`. `tailwind.install` and `esbuild.install` pull the
+# binaries on first use.
+COPY apps/guildhall_web/assets apps/guildhall_web/assets
+RUN cd apps/guildhall_web && \
+    mix assets.setup && \
+    mix assets.deploy
+
+# Compile the full umbrella + cut the release. Release name `guildhall`
+# is derived from the umbrella mix.exs (no explicit releases: block, so
+# it defaults to the project's apps_path-rooted umbrella name).
+RUN mix compile --warnings-as-errors && \
+    mix release --overwrite
+
+# ---------- Stage 2: runtime --------------------------------------------
+FROM debian:bookworm-slim AS runtime
+
+ENV LANG=en_US.UTF-8 \
+    LC_ALL=en_US.UTF-8 \
+    LANGUAGE=en_US:en
+
+# Runtime deps the compiled release needs. locales for the en_US.UTF-8
+# generation; libstdc++6 for the erlang ports; libncurses6 for the
+# beam; openssl for tls; libsystemd0 for the logger integration some
+# releases use; tini as a minimal pid-1 for reapability.
+RUN apt-get update -qq && \
+    apt-get install -y --no-install-recommends \
+        openssl \
+        libncurses6 \
+        libstdc++6 \
+        libsystemd0 \
+        locales \
+        ca-certificates \
+        curl \
+        tini \
+    && sed -i '/^# en_US.UTF-8 UTF-8/s/^# //' /etc/locale.gen \
+    && locale-gen en_US.UTF-8 \
+    && rm -rf /var/lib/apt/lists/*
+
+# Non-root user. uid 1000 matches the rest of the cluster's convention.
+RUN groupadd --system --gid 1000 guildhall && \
+    useradd --system --uid 1000 --gid guildhall --shell /usr/sbin/nologin \
+        --home /app --create-home guildhall
+
+WORKDIR /app
+
+# Release name `guildhall` produces the release at this path.
+COPY --from=builder --chown=guildhall:guildhall /app/_build/prod/rel/guildhall /app
+
+USER guildhall
+
+ENV HOME=/app \
+    PHX_SERVER=true \
+    PORT=4000
+
+EXPOSE 4000
+
+HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \
+    CMD curl -fsS http://localhost:4000/health || exit 1
+
+ENTRYPOINT ["/usr/bin/tini", "--"]
+CMD ["/app/bin/guildhall", "start"]
--- a/apps/guildhall_ops_db/lib/guildhall/ops_db/release.ex
+++ b/apps/guildhall_ops_db/lib/guildhall/ops_db/release.ex
@ -0,0 +1,45 @@
+defmodule Guildhall.OpsDb.Release do
+  @moduledoc """
+  Release-time DB tasks for the `guildhall` OTP release.
+
+  Mix is not available inside the compiled release, so ecto tasks can't
+  be invoked via `mix ecto.migrate` in production. This module wraps
+  the same operations so they can be called via:
+
+      bin/guildhall eval 'Guildhall.OpsDb.Release.migrate()'
+
+  Intended consumers:
+
+    - The `k8s/60-migration-job.yaml` Kubernetes Job, which runs this
+      module before `guildhall` Deployment rollout so schema changes
+      land exactly once per release rather than racing across N pods.
+    - Operators doing targeted rollback: `rollback(Guildhall.OpsDb.Repo, 20240101120000)`.
+
+  The @app is the OTP app whose `ecto_repos` key configures which Repos
+  to migrate. Guildhall's single Repo (`Guildhall.OpsDb.Repo`) is
+  registered under `:guildhall_ops_db`.
+  """
+
+  @app :guildhall_ops_db
+
+  def migrate do
+    load_app()
+
+    for repo <- repos() do
+      {:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, :up, all: true))
+    end
+  end
+
+  def rollback(repo, version) do
+    load_app()
+    {:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, :down, to: version))
+  end
+
+  defp repos do
+    Application.fetch_env!(@app, :ecto_repos)
+  end
+
+  defp load_app do
+    Application.load(@app)
+  end
+end
--- a/apps/guildhall_web/lib/guildhall_web_web/controllers/health_controller.ex
+++ b/apps/guildhall_web/lib/guildhall_web_web/controllers/health_controller.ex
@ -0,0 +1,34 @@
+defmodule GuildhallWeb.HealthController do
+  @moduledoc """
+  Kubernetes-probe and LB-target health endpoint.
+
+  The check is deliberately shallow: it confirms Phoenix is serving
+  requests AND the Ecto pool can execute a trivial query against the
+  OpsDb Repo. The latter catches the class of failures where the app
+  is running but has lost its DB connection — which a Kubernetes
+  liveness probe should notice.
+
+  Deeper health checks (Chronicle stream liveness, orchestrator state,
+  downstream substrate CRD reachability) are NOT in scope here —
+  their failure modes warrant different remediation (degraded-mode
+  operation rather than process restart) and a separate endpoint will
+  surface them when those integrations land.
+  """
+  use GuildhallWeb, :controller
+
+  alias Guildhall.OpsDb.Repo
+
+  def check(conn, _params) do
+    case Ecto.Adapters.SQL.query(Repo, "SELECT 1", []) do
+      {:ok, _} ->
+        conn
+        |> put_status(200)
+        |> json(%{status: "ok", checks: %{db: "ok"}})
+
+      {:error, reason} ->
+        conn
+        |> put_status(503)
+        |> json(%{status: "degraded", checks: %{db: "unreachable"}, reason: inspect(reason)})
+    end
+  end
+end
--- a/apps/guildhall_web/lib/guildhall_web_web/router.ex
+++ b/apps/guildhall_web/lib/guildhall_web_web/router.ex
@ -21,4 +21,13 @@ defmodule GuildhallWeb.Router do
    live "/ceremonies", CeremonyLive.Index, :index
    live "/artifacts", ArtifactLive.Index, :index
  end
+
+  # Health check endpoint for Kubernetes probes + LB targets.
+  # GET /health — returns 200 when Phoenix is up AND the Ecto pool
+  # can query the DB; 503 otherwise. Unauthenticated (the whole point
+  # is that it's reachable without credentials).
+  scope "/health", GuildhallWeb do
+    pipe_through :api
+    get "/", HealthController, :check
+  end
 end
--- a/k8s/00-namespace.yaml
+++ b/k8s/00-namespace.yaml
@ -0,0 +1,7 @@
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: guildhall
+  labels:
+    app.kubernetes.io/managed-by: manual
+    app.kubernetes.io/part-of: guildhouse
--- a/k8s/10-registry-secret-template.yaml
+++ b/k8s/10-registry-secret-template.yaml
@ -0,0 +1,36 @@
+# Container registry pull secret — TEMPLATE.
+#
+# Do NOT apply this file directly. The actual secret is created
+# imperatively so the Forgejo PAT never lands in git. Create it with:
+#
+#   kubectl create secret docker-registry guildhall-registry \
+#     --docker-server=git.guildhouse.dev \
+#     --docker-username=tking \
+#     --docker-password='<PAT-with-package:read-scope>' \
+#     --namespace=guildhall
+#
+# The PAT is generated at:
+#   https://git.guildhouse.dev/-/user/settings/applications
+# Required scope: `package:read` (or `package:write` if the same PAT
+# will also be used for `docker push` from the build host — scoping
+# read-only is preferable for cluster-resident credentials).
+#
+# If the `tking/guildhall` Forgejo package is made public, this secret
+# is not required and `imagePullSecrets` can be omitted from the
+# guildhall Deployment and Job. The Deployment manifests reference
+# it unconditionally; switching to public packages means removing
+# those references and deleting this secret.
+#
+# Shape reference (what `kubectl get secret -o yaml` would show):
+#
+# apiVersion: v1
+# kind: Secret
+# metadata:
+#   name: guildhall-registry
+#   namespace: guildhall
+#   labels:
+#     app.kubernetes.io/managed-by: manual
+#     app.kubernetes.io/part-of: guildhouse
+# type: kubernetes.io/dockerconfigjson
+# data:
+#   .dockerconfigjson: <base64-encoded {"auths":{"git.guildhouse.dev":{"auth":"<b64 user:pat>"}}}>
--- a/k8s/20-postgres-pvc.yaml
+++ b/k8s/20-postgres-pvc.yaml
@ -0,0 +1,16 @@
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: guildhall-db
+  namespace: guildhall
+  labels:
+    app.kubernetes.io/name: guildhall-postgres
+    app.kubernetes.io/part-of: guildhouse
+    app.kubernetes.io/component: database
+    app.kubernetes.io/managed-by: manual
+spec:
+  accessModes: [ReadWriteOnce]
+  storageClassName: longhorn
+  resources:
+    requests:
+      storage: 5Gi
--- a/k8s/30-postgres-deployment.yaml
+++ b/k8s/30-postgres-deployment.yaml
@ -0,0 +1,90 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: guildhall-postgres
+  namespace: guildhall
+  labels:
+    app.kubernetes.io/name: guildhall-postgres
+    app.kubernetes.io/part-of: guildhouse
+    app.kubernetes.io/component: database
+    app.kubernetes.io/managed-by: manual
+spec:
+  replicas: 1
+  strategy:
+    type: Recreate
+  selector:
+    matchLabels:
+      app: guildhall-postgres
+  template:
+    metadata:
+      labels:
+        app: guildhall-postgres
+        app.kubernetes.io/name: guildhall-postgres
+        app.kubernetes.io/part-of: guildhouse
+        app.kubernetes.io/component: database
+    spec:
+      containers:
+        - name: postgres
+          image: postgres:16
+          imagePullPolicy: IfNotPresent
+          ports:
+            - containerPort: 5432
+              name: postgres
+          env:
+            - name: POSTGRES_DB
+              valueFrom:
+                secretKeyRef:
+                  name: guildhall-db-credentials
+                  key: POSTGRES_DB
+            - name: POSTGRES_USER
+              valueFrom:
+                secretKeyRef:
+                  name: guildhall-db-credentials
+                  key: POSTGRES_USER
+            - name: POSTGRES_PASSWORD
+              valueFrom:
+                secretKeyRef:
+                  name: guildhall-db-credentials
+                  key: POSTGRES_PASSWORD
+            # PGDATA subdir under the mount is the standard fix for the
+            # lost+found that some filesystems create at the mount root,
+            # which postgres otherwise refuses to initialise into.
+            - name: PGDATA
+              value: /var/lib/postgresql/data/pgdata
+          volumeMounts:
+            - name: data
+              mountPath: /var/lib/postgresql/data
+          # Matches the Keycloak-postgres resource shape from the
+          # cluster: memory request 256Mi / limit 512Mi, CPU request
+          # 100m, no CPU limit. Guildhall's initial DB load is light
+          # so this is over-provisioned for v0.1; can be trimmed later.
+          resources:
+            requests:
+              cpu: 100m
+              memory: 256Mi
+            limits:
+              memory: 512Mi
+          readinessProbe:
+            exec:
+              command:
+                - pg_isready
+                - -U
+                - guildhall
+            initialDelaySeconds: 5
+            periodSeconds: 10
+            timeoutSeconds: 1
+            failureThreshold: 3
+          livenessProbe:
+            exec:
+              command:
+                - pg_isready
+                - -U
+                - guildhall
+            initialDelaySeconds: 15
+            periodSeconds: 20
+            timeoutSeconds: 1
+            failureThreshold: 3
+      volumes:
+        - name: data
+          persistentVolumeClaim:
+            claimName: guildhall-db
--- a/k8s/40-postgres-service.yaml
+++ b/k8s/40-postgres-service.yaml
@ -0,0 +1,19 @@
+apiVersion: v1
+kind: Service
+metadata:
+  name: guildhall-postgres
+  namespace: guildhall
+  labels:
+    app.kubernetes.io/name: guildhall-postgres
+    app.kubernetes.io/part-of: guildhouse
+    app.kubernetes.io/component: database
+    app.kubernetes.io/managed-by: manual
+spec:
+  type: ClusterIP
+  selector:
+    app: guildhall-postgres
+  ports:
+    - port: 5432
+      targetPort: 5432
+      protocol: TCP
+      name: postgres
--- a/k8s/50-guildhall-secrets-template.yaml
+++ b/k8s/50-guildhall-secrets-template.yaml
@ -0,0 +1,59 @@
+# Application + database secrets — TEMPLATES.
+#
+# Do NOT apply these files directly. Secret values are created
+# imperatively so passwords and session keys never land in git.
+# Two Secrets are created at deploy time:
+#
+# ---------- guildhall-db-credentials ----------
+# Consumed by the guildhall-postgres Deployment (for its own env) and
+# by guildhall-app-secrets (the password is also needed to construct
+# DATABASE_URL).
+#
+#   DB_PASSWORD="$(openssl rand -base64 32 | tr -d '/+=' | head -c 32)"
+#
+#   kubectl create secret generic guildhall-db-credentials \
+#     --from-literal=POSTGRES_DB=guildhall \
+#     --from-literal=POSTGRES_USER=guildhall \
+#     --from-literal=POSTGRES_PASSWORD="$DB_PASSWORD" \
+#     --namespace=guildhall
+#
+# Shape:
+#
+# apiVersion: v1
+# kind: Secret
+# metadata:
+#   name: guildhall-db-credentials
+#   namespace: guildhall
+# type: Opaque
+# data:
+#   POSTGRES_DB:       <b64 "guildhall">
+#   POSTGRES_USER:     <b64 "guildhall">
+#   POSTGRES_PASSWORD: <b64 "<generated-strong-password>">
+#
+# ---------- guildhall-app-secrets ----------
+# Consumed by the guildhall Deployment and migration Job. Contains the
+# Phoenix session signing key and the DATABASE_URL used by Ecto at
+# runtime.
+#
+#   SECRET_KEY_BASE="$(cd /home/tking/projects/substrate-project/guildhall && mix phx.gen.secret)"
+#
+#   kubectl create secret generic guildhall-app-secrets \
+#     --from-literal=SECRET_KEY_BASE="$SECRET_KEY_BASE" \
+#     --from-literal=DATABASE_URL="ecto://guildhall:$DB_PASSWORD@guildhall-postgres:5432/guildhall" \
+#     --namespace=guildhall
+#
+# Note: `ecto://` scheme, not `postgres://` — `config/runtime.exs`
+# invokes Ecto.Repo's built-in URL parser which accepts either, but
+# `ecto://` is the canonical form in Phoenix-generated config.
+#
+# Shape:
+#
+# apiVersion: v1
+# kind: Secret
+# metadata:
+#   name: guildhall-app-secrets
+#   namespace: guildhall
+# type: Opaque
+# data:
+#   SECRET_KEY_BASE: <b64 "<64-byte-base64-session-key>">
+#   DATABASE_URL:    <b64 "ecto://guildhall:<pw>@guildhall-postgres:5432/guildhall">
--- a/k8s/60-migration-job.yaml
+++ b/k8s/60-migration-job.yaml
@ -0,0 +1,66 @@
+# Guildhall DB migration Job.
+#
+# Runs `Guildhall.OpsDb.Release.migrate/0` via the compiled release's
+# `bin/guildhall eval` entry point. Intended to be applied ONCE per
+# image-tag deploy, BEFORE the guildhall Deployment is created or
+# rolled. Running migrations from within the app pods on startup
+# would race across replicas and is explicitly avoided.
+#
+# The Job name includes the image tag so multiple deploys across tags
+# can coexist in history (Kubernetes Jobs with the same name cannot
+# be re-run without deletion). For deploy N+1, either:
+#   kubectl delete job guildhall-migrate-v0-1-0 -n guildhall
+#   kubectl apply -f 60-migration-job.yaml    # update the name first
+# or use `kubectl create job <new-name> --from=...` with a fresh name.
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: guildhall-migrate-v0-1-0
+  namespace: guildhall
+  labels:
+    app.kubernetes.io/name: guildhall
+    app.kubernetes.io/part-of: guildhouse
+    app.kubernetes.io/component: migration
+    app.kubernetes.io/managed-by: manual
+    app.kubernetes.io/version: v0.1.0
+spec:
+  backoffLimit: 3
+  ttlSecondsAfterFinished: 86400
+  template:
+    metadata:
+      labels:
+        app: guildhall-migrate
+        app.kubernetes.io/name: guildhall
+        app.kubernetes.io/part-of: guildhouse
+        app.kubernetes.io/component: migration
+    spec:
+      restartPolicy: OnFailure
+      imagePullSecrets:
+        - name: guildhall-registry
+      containers:
+        - name: migrate
+          image: git.guildhouse.dev/tking/guildhall:v0.1.0
+          imagePullPolicy: IfNotPresent
+          command:
+            - /app/bin/guildhall
+            - eval
+            - "Guildhall.OpsDb.Release.migrate()"
+          env:
+            - name: DATABASE_URL
+              valueFrom:
+                secretKeyRef:
+                  name: guildhall-app-secrets
+                  key: DATABASE_URL
+            - name: SECRET_KEY_BASE
+              valueFrom:
+                secretKeyRef:
+                  name: guildhall-app-secrets
+                  key: SECRET_KEY_BASE
+            - name: POOL_SIZE
+              value: "2"
+          resources:
+            requests:
+              cpu: 100m
+              memory: 256Mi
+            limits:
+              memory: 512Mi
--- a/k8s/70-guildhall-deployment.yaml
+++ b/k8s/70-guildhall-deployment.yaml
@ -0,0 +1,98 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: guildhall
+  namespace: guildhall
+  labels:
+    app.kubernetes.io/name: guildhall
+    app.kubernetes.io/part-of: guildhouse
+    app.kubernetes.io/component: web
+    app.kubernetes.io/managed-by: manual
+    app.kubernetes.io/version: v0.1.0
+spec:
+  replicas: 1
+  strategy:
+    type: RollingUpdate
+    rollingUpdate:
+      maxSurge: 1
+      maxUnavailable: 0
+  selector:
+    matchLabels:
+      app: guildhall
+  template:
+    metadata:
+      labels:
+        app: guildhall
+        app.kubernetes.io/name: guildhall
+        app.kubernetes.io/part-of: guildhouse
+        app.kubernetes.io/component: web
+        app.kubernetes.io/version: v0.1.0
+    spec:
+      imagePullSecrets:
+        - name: guildhall-registry
+      containers:
+        - name: guildhall
+          image: git.guildhouse.dev/tking/guildhall:v0.1.0
+          imagePullPolicy: IfNotPresent
+          ports:
+            - containerPort: 4000
+              name: http
+              protocol: TCP
+          env:
+            # Phoenix / endpoint
+            - name: PHX_SERVER
+              value: "true"
+            - name: PHX_HOST
+              value: guildhall.guildhouse.dev
+            - name: PORT
+              value: "4000"
+            - name: POOL_SIZE
+              value: "10"
+            # Session signing key
+            - name: SECRET_KEY_BASE
+              valueFrom:
+                secretKeyRef:
+                  name: guildhall-app-secrets
+                  key: SECRET_KEY_BASE
+            # Ecto
+            - name: DATABASE_URL
+              valueFrom:
+                secretKeyRef:
+                  name: guildhall-app-secrets
+                  key: DATABASE_URL
+          # Starting envelope. Tune after observing real usage under
+          # LiveView fan-out; Phoenix's memory footprint grows with
+          # connected clients.
+          resources:
+            requests:
+              cpu: 200m
+              memory: 256Mi
+            limits:
+              cpu: "1"
+              memory: 1Gi
+          # Probes hit /health, which queries the Ecto pool. See
+          # apps/guildhall_web/lib/guildhall_web_web/controllers/health_controller.ex
+          # for semantics.
+          readinessProbe:
+            httpGet:
+              path: /health
+              port: http
+            initialDelaySeconds: 10
+            periodSeconds: 5
+            timeoutSeconds: 3
+            failureThreshold: 3
+          livenessProbe:
+            httpGet:
+              path: /health
+              port: http
+            initialDelaySeconds: 60
+            periodSeconds: 30
+            timeoutSeconds: 5
+            failureThreshold: 3
+          # Graceful shutdown allowance. Phoenix endpoint shuts down
+          # cleanly inside this window.
+          lifecycle:
+            preStop:
+              exec:
+                command: ["/bin/sh", "-c", "sleep 5"]
+      terminationGracePeriodSeconds: 30
--- a/k8s/80-guildhall-service.yaml
+++ b/k8s/80-guildhall-service.yaml
@ -0,0 +1,32 @@
+apiVersion: v1
+kind: Service
+metadata:
+  name: guildhall
+  namespace: guildhall
+  labels:
+    app.kubernetes.io/name: guildhall
+    app.kubernetes.io/part-of: guildhouse
+    app.kubernetes.io/component: web
+    app.kubernetes.io/managed-by: manual
+  # Hetzner Cloud Controller Manager annotations. Matches the exact
+  # annotation set used by keycloak/keycloak (verified from the cluster
+  # on 2026-04-21): location / name / type / use-private-ip. No
+  # algorithm-type, no uses-proxyprotocol — the cluster's convention
+  # is the minimal set.
+  annotations:
+    load-balancer.hetzner.cloud/location: nbg1
+    load-balancer.hetzner.cloud/name: guildhall
+    load-balancer.hetzner.cloud/type: lb11
+    load-balancer.hetzner.cloud/use-private-ip: "false"
+spec:
+  type: LoadBalancer
+  # TLS terminates at Cloudflare (orange cloud); origin is plain HTTP
+  # on port 80 → app's 4000. This matches forgejo/keycloak. Upgrading
+  # to in-cluster TLS via cert-manager is hygiene follow-up, not v0.1.
+  ports:
+    - port: 80
+      targetPort: http
+      protocol: TCP
+      name: http
+  selector:
+    app: guildhall