diff --git a/.dockerignore b/.dockerignore new file mode 100644 index 0000000..87e0c49 --- /dev/null +++ b/.dockerignore @@ -0,0 +1,53 @@ +# Build artifacts +_build/ +deps/ +apps/*/_build/ +apps/*/deps/ + +# Editor / dev state +.elixir_ls/ +.idea/ +.vscode/ +*.swp +*.swo +.DS_Store + +# VCS +.git/ +.gitignore +.gitattributes + +# CI / deploy manifests (not needed in image) +k8s/ +.forgejo/ +.github/ + +# Generated web assets — phx.digest regenerates these in the builder +# stage. The source assets in apps/guildhall_web/assets/ are what the +# builder needs. +apps/guildhall_web/priv/static/assets/ +apps/guildhall_web/priv/static/cache_manifest.json + +# Test + cover +apps/*/test/ +cover/ +.coverdata + +# Docs + tmp + logs +doc/ +tmp/ +*.log + +# Env (never ship into images) +.env +.env.* +!.env.example + +# Release artifacts if any were built locally +rel/ +releases/ + +# Docs + design files at repo root +*.md +!README.md +AGENTS.md diff --git a/DEPLOY-RUNBOOK.md b/DEPLOY-RUNBOOK.md new file mode 100644 index 0000000..29a92f4 --- /dev/null +++ b/DEPLOY-RUNBOOK.md @@ -0,0 +1,354 @@ +# Guildhall deploy runbook + +**Target:** `guildhall.guildhouse.dev` on the Hetzner Talos cluster, via Forgejo container registry at `git.guildhouse.dev/tking/guildhall`. +**Pattern:** direct `kubectl apply` against the cluster; Flux integration deferred. TLS terminates at Cloudflare (orange cloud); origin is plain HTTP on the Hetzner LB. +**Required reference docs:** `DEPLOY-EXPLORATORY-2026-04-21.md` (cluster state), `FORGEJO-REGISTRY-INVESTIGATION-2026-04-21.md` (registry state). + +Tag referenced throughout this runbook: **`v0.1.0`**. When deploying a subsequent tag, substitute throughout OR use the sed helper at the bottom. + +--- + +## Prerequisites + +- `kubectl` configured against the Talos cluster (`KUBECONFIG=~/projects/substrate-project/guildhouse-talos-bootstrap/kubeconfig`) +- `docker` available on the build host with enough disk for an Elixir build image (~2 GB) +- Cloudflare account access for `guildhouse.dev` DNS +- Forgejo account `tking` at `git.guildhouse.dev` + +--- + +## Phase 1 — Build and push the image + +### 1.1 Create a Forgejo Personal Access Token + +Navigate to `https://git.guildhouse.dev/-/user/settings/applications`. Generate a new token: + +- **Token name:** `guildhall-registry-push` (or similar) +- **Scopes:** `package:write` (this token will both push and pull; scope down to `package:read` for a separate in-cluster-pull token if splitting) +- **Expiry:** operator's choice; 30-90 days is reasonable for the push token + +Copy the token value immediately (Forgejo won't show it again). Save it in your password manager. + +### 1.2 Docker login + +```bash +docker login git.guildhouse.dev -u tking +# paste PAT when prompted +``` + +Verify with `cat ~/.docker/config.json | jq '.auths | keys'` — `git.guildhouse.dev` should appear. + +### 1.3 Build the image + +```bash +cd /home/tking/projects/substrate-project/guildhall +docker build -t git.guildhouse.dev/tking/guildhall:v0.1.0 . +``` + +Cold build takes ~5-10 minutes (mix deps + erlang compile + tailwind + esbuild + phx.digest + mix release). Subsequent builds hit Docker layer cache and are much faster. + +Verify the image runs before pushing: + +```bash +docker run --rm -it --entrypoint /bin/sh \ + git.guildhouse.dev/tking/guildhall:v0.1.0 \ + -c 'ls -la /app/bin && /app/bin/guildhall version' +``` + +Expected: the `guildhall` release binary is present and `version` returns the release version without error. + +### 1.4 Push to Forgejo registry + +```bash +docker push git.guildhouse.dev/tking/guildhall:v0.1.0 +``` + +### 1.5 Verify image is in the registry + +Via Forgejo UI: `https://git.guildhouse.dev/tking/-/packages` → should list `guildhall` with a `v0.1.0` tag. + +Via registry API (authenticated): + +```bash +curl -sS -u tking: https://git.guildhouse.dev/v2/tking/guildhall/tags/list +# → {"name":"tking/guildhall","tags":["v0.1.0"]} +``` + +### 1.6 Decide package visibility + +In the Forgejo UI, for the new `guildhall` container package: + +- **Private** (default, recommended for tonight): cluster needs `guildhall-registry` pull secret (Phase 2.2 below creates it) +- **Public:** anonymous pulls work; skip Phase 2.2 and remove `imagePullSecrets` from `k8s/60-migration-job.yaml` and `k8s/70-guildhall-deployment.yaml` before applying + +--- + +## Phase 2 — Cluster-side preparation + +### 2.1 Create the namespace + +```bash +kubectl apply -f k8s/00-namespace.yaml +``` + +Verify: `kubectl get ns guildhall` → `Active`. + +### 2.2 Create the registry pull secret (if package is private) + +```bash +kubectl create secret docker-registry guildhall-registry \ + --docker-server=git.guildhouse.dev \ + --docker-username=tking \ + --docker-password='' \ + --namespace=guildhall +``` + +Optionally use a read-only PAT here instead of the push PAT from Phase 1.1. Skip this step entirely if the package is public. + +### 2.3 Create the database credentials secret + +Generate a strong password and save it to your password manager before running: + +```bash +DB_PASSWORD="$(openssl rand -base64 32 | tr -d '/+=' | head -c 32)" +echo "Save this: $DB_PASSWORD" + +kubectl create secret generic guildhall-db-credentials \ + --from-literal=POSTGRES_DB=guildhall \ + --from-literal=POSTGRES_USER=guildhall \ + --from-literal=POSTGRES_PASSWORD="$DB_PASSWORD" \ + --namespace=guildhall +``` + +### 2.4 Create the application secrets + +```bash +SECRET_KEY_BASE="$(cd /home/tking/projects/substrate-project/guildhall && mix phx.gen.secret)" + +kubectl create secret generic guildhall-app-secrets \ + --from-literal=SECRET_KEY_BASE="$SECRET_KEY_BASE" \ + --from-literal=DATABASE_URL="ecto://guildhall:$DB_PASSWORD@guildhall-postgres:5432/guildhall" \ + --namespace=guildhall +``` + +Verify secrets exist: + +```bash +kubectl get secrets -n guildhall +# expect: guildhall-registry, guildhall-db-credentials, guildhall-app-secrets +``` + +--- + +## Phase 3 — Database provisioning + +### 3.1 Apply Postgres PVC, Deployment, Service + +```bash +kubectl apply -f k8s/20-postgres-pvc.yaml +kubectl apply -f k8s/30-postgres-deployment.yaml +kubectl apply -f k8s/40-postgres-service.yaml +``` + +### 3.2 Wait for Postgres Ready + +```bash +kubectl rollout status deployment/guildhall-postgres -n guildhall --timeout=5m +kubectl wait --for=condition=Ready pod \ + -l app=guildhall-postgres -n guildhall --timeout=3m +``` + +Verify it accepts connections: + +```bash +kubectl exec -n guildhall deployment/guildhall-postgres -- \ + pg_isready -U guildhall +# → /var/run/postgresql:5432 - accepting connections +``` + +--- + +## Phase 4 — Schema migration + +### 4.1 Run the migration Job + +```bash +kubectl apply -f k8s/60-migration-job.yaml +``` + +### 4.2 Wait for Job completion + +```bash +kubectl wait --for=condition=complete job/guildhall-migrate-v0-1-0 \ + -n guildhall --timeout=3m +``` + +### 4.3 Verify migration output + +```bash +kubectl logs job/guildhall-migrate-v0-1-0 -n guildhall +``` + +Look for `Migrations already up` (no-op if Guildhall has no migrations yet) or a list of `== Running 20xx...` / `== Migrated` entries. + +If the Job fails, inspect events + logs: + +```bash +kubectl describe job guildhall-migrate-v0-1-0 -n guildhall +kubectl logs job/guildhall-migrate-v0-1-0 -n guildhall +``` + +Common failures and remediation: DATABASE_URL pointing at a wrong host (check `guildhall-app-secrets`); Postgres not yet accepting auth (wait longer); migration SQL error (fix in source, rebuild image, re-push, re-apply Job). + +--- + +## Phase 5 — Application deployment + +### 5.1 Apply Guildhall Deployment + Service + +```bash +kubectl apply -f k8s/70-guildhall-deployment.yaml +kubectl apply -f k8s/80-guildhall-service.yaml +``` + +### 5.2 Wait for Deployment rollout + +```bash +kubectl rollout status deployment/guildhall -n guildhall --timeout=5m +``` + +If this hangs, check pod events + logs: + +```bash +kubectl get pods -n guildhall +kubectl describe pod -n guildhall -l app=guildhall +kubectl logs -n guildhall -l app=guildhall --tail=100 +``` + +### 5.3 Obtain the LoadBalancer IP + +Hetzner CCM provisions a new LB; allow 30-90 seconds after the Service is applied. + +```bash +kubectl get svc guildhall -n guildhall -w +# ^C once EXTERNAL-IP transitions from to a public address +``` + +Record the IPv4 in `EXTERNAL-IP`. IPv6 will also be assigned; note both. + +--- + +## Phase 6 — DNS + end-to-end verification + +### 6.1 Create Cloudflare DNS records + +In the Cloudflare dashboard for `guildhouse.dev` (or via `flarectl` / `terraform` if automated), create: + +- **A record:** `guildhall` → `` — **proxied (orange cloud)** +- **AAAA record** (optional, recommended): `guildhall` → `` — proxied + +Proxied is load-bearing: it's what provides TLS. Do NOT grey-cloud this record. + +### 6.2 Smoke test + +Allow Cloudflare's edge to pick up the record (1-2 minutes). + +```bash +# Health endpoint — unauthenticated, should return 200 +curl -sS -w '\n-- HTTP %{http_code} --\n' https://guildhall.guildhouse.dev/health + +# Root — should return 200 with LiveView-rendered HTML +curl -sS -w '\n-- HTTP %{http_code} --\n' -I https://guildhall.guildhouse.dev/ +``` + +Expected: `/health` returns `200` with `{"status":"ok","checks":{"db":"ok"}}`; `/` returns `200` with Phoenix's rendered HTML. + +### 6.3 Manual walkthrough + +In a browser, visit `https://guildhall.guildhouse.dev/`: + +- Dashboard LiveView should render +- `/ceremonies` and `/artifacts` should render (will be empty — no data yet) +- No certificate warnings (Cloudflare-terminated TLS) + +--- + +## Iterating on subsequent tags + +For v0.1.1, v0.1.2, etc.: + +1. Build + push the new image +2. Update the `image:` tag in `k8s/60-migration-job.yaml` and `k8s/70-guildhall-deployment.yaml` +3. Update the Job name in `k8s/60-migration-job.yaml` (e.g. `guildhall-migrate-v0-1-1`) +4. `kubectl apply -f k8s/60-migration-job.yaml` — run the new migration Job +5. `kubectl apply -f k8s/70-guildhall-deployment.yaml` — rolling update of Guildhall + +A sed helper to bump everything at once: + +```bash +OLD=v0.1.0; NEW=v0.1.1 +sed -i "s|guildhall:${OLD}|guildhall:${NEW}|g" \ + k8s/60-migration-job.yaml k8s/70-guildhall-deployment.yaml +sed -i "s|guildhall-migrate-${OLD//./-}|guildhall-migrate-${NEW//./-}|g" \ + k8s/60-migration-job.yaml +``` + +--- + +## Rollback + +### Back out the current deployment + +Rolling back to a prior image tag (assuming the prior tag is still in the registry): + +```bash +kubectl set image -n guildhall deployment/guildhall \ + guildhall=git.guildhouse.dev/tking/guildhall: +kubectl rollout status -n guildhall deployment/guildhall +``` + +Schema rollback (only if the current deploy introduced migrations that need to be reverted): + +```bash +kubectl run guildhall-rollback --rm -it \ + --image=git.guildhouse.dev/tking/guildhall: \ + --overrides='{"spec":{"imagePullSecrets":[{"name":"guildhall-registry"}]}}' \ + -n guildhall -- \ + /app/bin/guildhall eval "Guildhall.OpsDb.Release.rollback(Guildhall.OpsDb.Repo, )" +``` + +### Tear down the whole deployment + +```bash +# Delete in reverse order; namespace deletion cascades everything +# attached to it (Deployments, Services, Pods, PVC... note that +# deleting the namespace ALSO deletes the PVC, which destroys the +# database. For non-destructive teardown, preserve the PVC first.) + +kubectl delete svc guildhall -n guildhall # triggers Hetzner LB deprovision +kubectl delete deployment guildhall -n guildhall +kubectl delete job -l app.kubernetes.io/name=guildhall,app.kubernetes.io/component=migration -n guildhall +kubectl delete deployment guildhall-postgres -n guildhall +kubectl delete svc guildhall-postgres -n guildhall + +# PVC delete is destructive (Longhorn reclaim policy is Delete). +# Uncomment only if the database state should be destroyed: +# kubectl delete pvc guildhall-db -n guildhall + +kubectl delete secret guildhall-registry guildhall-db-credentials guildhall-app-secrets -n guildhall + +# Finally the namespace itself (retained if you want to keep PVC): +# kubectl delete namespace guildhall +``` + +Remove the Cloudflare DNS record for `guildhall.guildhouse.dev` if fully tearing down. + +--- + +## Known v0.1 limitations + +- **Cloudflare-edge TLS, not cluster-terminated.** Upgrading to cert-manager Certificate + in-cluster TLS is hygiene follow-up once the first deploy stabilizes. The `letsencrypt-prod` ClusterIssuer is already ready. +- **No Flux integration.** Direct `kubectl apply` is the deploy mechanism for v0.1. Flux Kustomization for Guildhall is follow-up — especially once the broader Flux chain (cluster-infra, spire, quartermaster) is healed. +- **No OIDC / Keycloak integration.** Guildhall's `config/runtime.exs` has commented-out OIDC env vars; wiring them to the existing `auth.guildhouse.dev` Keycloak is follow-up. +- **No substrate CRD integration.** The CeremonyOrchestrator and ChronicleConsumer stubs are not yet watching real substrate CRDs — those integrations land after the substrate foundation is reconciling on this cluster. +- **Single replica.** Safe for LiveView (no cluster sticky-session concerns at replicas=1). Scale once DNS cluster / horizontal-pod-autoscaler is configured. diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 0000000..ea9345b --- /dev/null +++ b/Dockerfile @@ -0,0 +1,110 @@ +# syntax=docker/dockerfile:1.7 +# +# Guildhall production image — Elixir/Phoenix umbrella release. +# Multi-stage: builder produces a mix release; runtime is a slim debian +# carrying only the OTP release + runtime libs. +# +# Build context: the guildhall umbrella root. +# Target registry: git.guildhouse.dev/tking/guildhall: + +# ---------- Stage 1: builder --------------------------------------------- +FROM hexpm/elixir:1.17.3-erlang-27.1.2-debian-bookworm-20241202-slim AS builder + +ENV MIX_ENV=prod \ + LANG=C.UTF-8 \ + LC_ALL=C.UTF-8 + +RUN apt-get update -qq && \ + apt-get install -y --no-install-recommends \ + build-essential \ + git \ + ca-certificates \ + curl \ + && rm -rf /var/lib/apt/lists/* + +RUN mix local.hex --force && \ + mix local.rebar --force + +WORKDIR /app + +# Dep resolution needs every apps/*/mix.exs in an umbrella. Copy them +# before any source so dep-layer cache survives source-only edits. +COPY mix.exs mix.lock ./ +COPY config/config.exs config/prod.exs config/runtime.exs config/ +COPY apps/guildhall_chronicle/mix.exs apps/guildhall_chronicle/ +COPY apps/guildhall_graph_bridge/mix.exs apps/guildhall_graph_bridge/ +COPY apps/guildhall_ops_db/mix.exs apps/guildhall_ops_db/ +COPY apps/guildhall_orchestrator/mix.exs apps/guildhall_orchestrator/ +COPY apps/guildhall_web/mix.exs apps/guildhall_web/ + +RUN mix deps.get --only prod && \ + mix deps.compile + +# Source — copied after dep layers so app-source changes don't bust +# the dep cache. +COPY apps/ apps/ + +# Asset pipeline for guildhall_web — tailwind + esbuild + phx.digest. +# The aliases in apps/guildhall_web/mix.exs define `assets.deploy` as +# `tailwind guildhall_web --minify` + `esbuild guildhall_web --minify` +# + `phx.digest`. `tailwind.install` and `esbuild.install` pull the +# binaries on first use. +COPY apps/guildhall_web/assets apps/guildhall_web/assets +RUN cd apps/guildhall_web && \ + mix assets.setup && \ + mix assets.deploy + +# Compile the full umbrella + cut the release. Release name `guildhall` +# is derived from the umbrella mix.exs (no explicit releases: block, so +# it defaults to the project's apps_path-rooted umbrella name). +RUN mix compile --warnings-as-errors && \ + mix release --overwrite + +# ---------- Stage 2: runtime -------------------------------------------- +FROM debian:bookworm-slim AS runtime + +ENV LANG=en_US.UTF-8 \ + LC_ALL=en_US.UTF-8 \ + LANGUAGE=en_US:en + +# Runtime deps the compiled release needs. locales for the en_US.UTF-8 +# generation; libstdc++6 for the erlang ports; libncurses6 for the +# beam; openssl for tls; libsystemd0 for the logger integration some +# releases use; tini as a minimal pid-1 for reapability. +RUN apt-get update -qq && \ + apt-get install -y --no-install-recommends \ + openssl \ + libncurses6 \ + libstdc++6 \ + libsystemd0 \ + locales \ + ca-certificates \ + curl \ + tini \ + && sed -i '/^# en_US.UTF-8 UTF-8/s/^# //' /etc/locale.gen \ + && locale-gen en_US.UTF-8 \ + && rm -rf /var/lib/apt/lists/* + +# Non-root user. uid 1000 matches the rest of the cluster's convention. +RUN groupadd --system --gid 1000 guildhall && \ + useradd --system --uid 1000 --gid guildhall --shell /usr/sbin/nologin \ + --home /app --create-home guildhall + +WORKDIR /app + +# Release name `guildhall` produces the release at this path. +COPY --from=builder --chown=guildhall:guildhall /app/_build/prod/rel/guildhall /app + +USER guildhall + +ENV HOME=/app \ + PHX_SERVER=true \ + PORT=4000 + +EXPOSE 4000 + +HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \ + CMD curl -fsS http://localhost:4000/health || exit 1 + +ENTRYPOINT ["/usr/bin/tini", "--"] +CMD ["/app/bin/guildhall", "start"] diff --git a/apps/guildhall_ops_db/lib/guildhall/ops_db/release.ex b/apps/guildhall_ops_db/lib/guildhall/ops_db/release.ex new file mode 100644 index 0000000..cec5a26 --- /dev/null +++ b/apps/guildhall_ops_db/lib/guildhall/ops_db/release.ex @@ -0,0 +1,45 @@ +defmodule Guildhall.OpsDb.Release do + @moduledoc """ + Release-time DB tasks for the `guildhall` OTP release. + + Mix is not available inside the compiled release, so ecto tasks can't + be invoked via `mix ecto.migrate` in production. This module wraps + the same operations so they can be called via: + + bin/guildhall eval 'Guildhall.OpsDb.Release.migrate()' + + Intended consumers: + + - The `k8s/60-migration-job.yaml` Kubernetes Job, which runs this + module before `guildhall` Deployment rollout so schema changes + land exactly once per release rather than racing across N pods. + - Operators doing targeted rollback: `rollback(Guildhall.OpsDb.Repo, 20240101120000)`. + + The @app is the OTP app whose `ecto_repos` key configures which Repos + to migrate. Guildhall's single Repo (`Guildhall.OpsDb.Repo`) is + registered under `:guildhall_ops_db`. + """ + + @app :guildhall_ops_db + + def migrate do + load_app() + + for repo <- repos() do + {:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, :up, all: true)) + end + end + + def rollback(repo, version) do + load_app() + {:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, :down, to: version)) + end + + defp repos do + Application.fetch_env!(@app, :ecto_repos) + end + + defp load_app do + Application.load(@app) + end +end diff --git a/apps/guildhall_web/lib/guildhall_web_web/controllers/health_controller.ex b/apps/guildhall_web/lib/guildhall_web_web/controllers/health_controller.ex new file mode 100644 index 0000000..26431ae --- /dev/null +++ b/apps/guildhall_web/lib/guildhall_web_web/controllers/health_controller.ex @@ -0,0 +1,34 @@ +defmodule GuildhallWeb.HealthController do + @moduledoc """ + Kubernetes-probe and LB-target health endpoint. + + The check is deliberately shallow: it confirms Phoenix is serving + requests AND the Ecto pool can execute a trivial query against the + OpsDb Repo. The latter catches the class of failures where the app + is running but has lost its DB connection — which a Kubernetes + liveness probe should notice. + + Deeper health checks (Chronicle stream liveness, orchestrator state, + downstream substrate CRD reachability) are NOT in scope here — + their failure modes warrant different remediation (degraded-mode + operation rather than process restart) and a separate endpoint will + surface them when those integrations land. + """ + use GuildhallWeb, :controller + + alias Guildhall.OpsDb.Repo + + def check(conn, _params) do + case Ecto.Adapters.SQL.query(Repo, "SELECT 1", []) do + {:ok, _} -> + conn + |> put_status(200) + |> json(%{status: "ok", checks: %{db: "ok"}}) + + {:error, reason} -> + conn + |> put_status(503) + |> json(%{status: "degraded", checks: %{db: "unreachable"}, reason: inspect(reason)}) + end + end +end diff --git a/apps/guildhall_web/lib/guildhall_web_web/router.ex b/apps/guildhall_web/lib/guildhall_web_web/router.ex index e73bb6f..1ae5060 100644 --- a/apps/guildhall_web/lib/guildhall_web_web/router.ex +++ b/apps/guildhall_web/lib/guildhall_web_web/router.ex @@ -21,4 +21,13 @@ defmodule GuildhallWeb.Router do live "/ceremonies", CeremonyLive.Index, :index live "/artifacts", ArtifactLive.Index, :index end + + # Health check endpoint for Kubernetes probes + LB targets. + # GET /health — returns 200 when Phoenix is up AND the Ecto pool + # can query the DB; 503 otherwise. Unauthenticated (the whole point + # is that it's reachable without credentials). + scope "/health", GuildhallWeb do + pipe_through :api + get "/", HealthController, :check + end end diff --git a/k8s/00-namespace.yaml b/k8s/00-namespace.yaml new file mode 100644 index 0000000..2dd7069 --- /dev/null +++ b/k8s/00-namespace.yaml @@ -0,0 +1,7 @@ +apiVersion: v1 +kind: Namespace +metadata: + name: guildhall + labels: + app.kubernetes.io/managed-by: manual + app.kubernetes.io/part-of: guildhouse diff --git a/k8s/10-registry-secret-template.yaml b/k8s/10-registry-secret-template.yaml new file mode 100644 index 0000000..6199639 --- /dev/null +++ b/k8s/10-registry-secret-template.yaml @@ -0,0 +1,36 @@ +# Container registry pull secret — TEMPLATE. +# +# Do NOT apply this file directly. The actual secret is created +# imperatively so the Forgejo PAT never lands in git. Create it with: +# +# kubectl create secret docker-registry guildhall-registry \ +# --docker-server=git.guildhouse.dev \ +# --docker-username=tking \ +# --docker-password='' \ +# --namespace=guildhall +# +# The PAT is generated at: +# https://git.guildhouse.dev/-/user/settings/applications +# Required scope: `package:read` (or `package:write` if the same PAT +# will also be used for `docker push` from the build host — scoping +# read-only is preferable for cluster-resident credentials). +# +# If the `tking/guildhall` Forgejo package is made public, this secret +# is not required and `imagePullSecrets` can be omitted from the +# guildhall Deployment and Job. The Deployment manifests reference +# it unconditionally; switching to public packages means removing +# those references and deleting this secret. +# +# Shape reference (what `kubectl get secret -o yaml` would show): +# +# apiVersion: v1 +# kind: Secret +# metadata: +# name: guildhall-registry +# namespace: guildhall +# labels: +# app.kubernetes.io/managed-by: manual +# app.kubernetes.io/part-of: guildhouse +# type: kubernetes.io/dockerconfigjson +# data: +# .dockerconfigjson: "}}}> diff --git a/k8s/20-postgres-pvc.yaml b/k8s/20-postgres-pvc.yaml new file mode 100644 index 0000000..dc3e4b3 --- /dev/null +++ b/k8s/20-postgres-pvc.yaml @@ -0,0 +1,16 @@ +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: guildhall-db + namespace: guildhall + labels: + app.kubernetes.io/name: guildhall-postgres + app.kubernetes.io/part-of: guildhouse + app.kubernetes.io/component: database + app.kubernetes.io/managed-by: manual +spec: + accessModes: [ReadWriteOnce] + storageClassName: longhorn + resources: + requests: + storage: 5Gi diff --git a/k8s/30-postgres-deployment.yaml b/k8s/30-postgres-deployment.yaml new file mode 100644 index 0000000..54abf57 --- /dev/null +++ b/k8s/30-postgres-deployment.yaml @@ -0,0 +1,90 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: guildhall-postgres + namespace: guildhall + labels: + app.kubernetes.io/name: guildhall-postgres + app.kubernetes.io/part-of: guildhouse + app.kubernetes.io/component: database + app.kubernetes.io/managed-by: manual +spec: + replicas: 1 + strategy: + type: Recreate + selector: + matchLabels: + app: guildhall-postgres + template: + metadata: + labels: + app: guildhall-postgres + app.kubernetes.io/name: guildhall-postgres + app.kubernetes.io/part-of: guildhouse + app.kubernetes.io/component: database + spec: + containers: + - name: postgres + image: postgres:16 + imagePullPolicy: IfNotPresent + ports: + - containerPort: 5432 + name: postgres + env: + - name: POSTGRES_DB + valueFrom: + secretKeyRef: + name: guildhall-db-credentials + key: POSTGRES_DB + - name: POSTGRES_USER + valueFrom: + secretKeyRef: + name: guildhall-db-credentials + key: POSTGRES_USER + - name: POSTGRES_PASSWORD + valueFrom: + secretKeyRef: + name: guildhall-db-credentials + key: POSTGRES_PASSWORD + # PGDATA subdir under the mount is the standard fix for the + # lost+found that some filesystems create at the mount root, + # which postgres otherwise refuses to initialise into. + - name: PGDATA + value: /var/lib/postgresql/data/pgdata + volumeMounts: + - name: data + mountPath: /var/lib/postgresql/data + # Matches the Keycloak-postgres resource shape from the + # cluster: memory request 256Mi / limit 512Mi, CPU request + # 100m, no CPU limit. Guildhall's initial DB load is light + # so this is over-provisioned for v0.1; can be trimmed later. + resources: + requests: + cpu: 100m + memory: 256Mi + limits: + memory: 512Mi + readinessProbe: + exec: + command: + - pg_isready + - -U + - guildhall + initialDelaySeconds: 5 + periodSeconds: 10 + timeoutSeconds: 1 + failureThreshold: 3 + livenessProbe: + exec: + command: + - pg_isready + - -U + - guildhall + initialDelaySeconds: 15 + periodSeconds: 20 + timeoutSeconds: 1 + failureThreshold: 3 + volumes: + - name: data + persistentVolumeClaim: + claimName: guildhall-db diff --git a/k8s/40-postgres-service.yaml b/k8s/40-postgres-service.yaml new file mode 100644 index 0000000..4c98dec --- /dev/null +++ b/k8s/40-postgres-service.yaml @@ -0,0 +1,19 @@ +apiVersion: v1 +kind: Service +metadata: + name: guildhall-postgres + namespace: guildhall + labels: + app.kubernetes.io/name: guildhall-postgres + app.kubernetes.io/part-of: guildhouse + app.kubernetes.io/component: database + app.kubernetes.io/managed-by: manual +spec: + type: ClusterIP + selector: + app: guildhall-postgres + ports: + - port: 5432 + targetPort: 5432 + protocol: TCP + name: postgres diff --git a/k8s/50-guildhall-secrets-template.yaml b/k8s/50-guildhall-secrets-template.yaml new file mode 100644 index 0000000..611ad92 --- /dev/null +++ b/k8s/50-guildhall-secrets-template.yaml @@ -0,0 +1,59 @@ +# Application + database secrets — TEMPLATES. +# +# Do NOT apply these files directly. Secret values are created +# imperatively so passwords and session keys never land in git. +# Two Secrets are created at deploy time: +# +# ---------- guildhall-db-credentials ---------- +# Consumed by the guildhall-postgres Deployment (for its own env) and +# by guildhall-app-secrets (the password is also needed to construct +# DATABASE_URL). +# +# DB_PASSWORD="$(openssl rand -base64 32 | tr -d '/+=' | head -c 32)" +# +# kubectl create secret generic guildhall-db-credentials \ +# --from-literal=POSTGRES_DB=guildhall \ +# --from-literal=POSTGRES_USER=guildhall \ +# --from-literal=POSTGRES_PASSWORD="$DB_PASSWORD" \ +# --namespace=guildhall +# +# Shape: +# +# apiVersion: v1 +# kind: Secret +# metadata: +# name: guildhall-db-credentials +# namespace: guildhall +# type: Opaque +# data: +# POSTGRES_DB: +# POSTGRES_USER: +# POSTGRES_PASSWORD: "> +# +# ---------- guildhall-app-secrets ---------- +# Consumed by the guildhall Deployment and migration Job. Contains the +# Phoenix session signing key and the DATABASE_URL used by Ecto at +# runtime. +# +# SECRET_KEY_BASE="$(cd /home/tking/projects/substrate-project/guildhall && mix phx.gen.secret)" +# +# kubectl create secret generic guildhall-app-secrets \ +# --from-literal=SECRET_KEY_BASE="$SECRET_KEY_BASE" \ +# --from-literal=DATABASE_URL="ecto://guildhall:$DB_PASSWORD@guildhall-postgres:5432/guildhall" \ +# --namespace=guildhall +# +# Note: `ecto://` scheme, not `postgres://` — `config/runtime.exs` +# invokes Ecto.Repo's built-in URL parser which accepts either, but +# `ecto://` is the canonical form in Phoenix-generated config. +# +# Shape: +# +# apiVersion: v1 +# kind: Secret +# metadata: +# name: guildhall-app-secrets +# namespace: guildhall +# type: Opaque +# data: +# SECRET_KEY_BASE: "> +# DATABASE_URL: @guildhall-postgres:5432/guildhall"> diff --git a/k8s/60-migration-job.yaml b/k8s/60-migration-job.yaml new file mode 100644 index 0000000..a50520c --- /dev/null +++ b/k8s/60-migration-job.yaml @@ -0,0 +1,66 @@ +# Guildhall DB migration Job. +# +# Runs `Guildhall.OpsDb.Release.migrate/0` via the compiled release's +# `bin/guildhall eval` entry point. Intended to be applied ONCE per +# image-tag deploy, BEFORE the guildhall Deployment is created or +# rolled. Running migrations from within the app pods on startup +# would race across replicas and is explicitly avoided. +# +# The Job name includes the image tag so multiple deploys across tags +# can coexist in history (Kubernetes Jobs with the same name cannot +# be re-run without deletion). For deploy N+1, either: +# kubectl delete job guildhall-migrate-v0-1-0 -n guildhall +# kubectl apply -f 60-migration-job.yaml # update the name first +# or use `kubectl create job --from=...` with a fresh name. +apiVersion: batch/v1 +kind: Job +metadata: + name: guildhall-migrate-v0-1-0 + namespace: guildhall + labels: + app.kubernetes.io/name: guildhall + app.kubernetes.io/part-of: guildhouse + app.kubernetes.io/component: migration + app.kubernetes.io/managed-by: manual + app.kubernetes.io/version: v0.1.0 +spec: + backoffLimit: 3 + ttlSecondsAfterFinished: 86400 + template: + metadata: + labels: + app: guildhall-migrate + app.kubernetes.io/name: guildhall + app.kubernetes.io/part-of: guildhouse + app.kubernetes.io/component: migration + spec: + restartPolicy: OnFailure + imagePullSecrets: + - name: guildhall-registry + containers: + - name: migrate + image: git.guildhouse.dev/tking/guildhall:v0.1.0 + imagePullPolicy: IfNotPresent + command: + - /app/bin/guildhall + - eval + - "Guildhall.OpsDb.Release.migrate()" + env: + - name: DATABASE_URL + valueFrom: + secretKeyRef: + name: guildhall-app-secrets + key: DATABASE_URL + - name: SECRET_KEY_BASE + valueFrom: + secretKeyRef: + name: guildhall-app-secrets + key: SECRET_KEY_BASE + - name: POOL_SIZE + value: "2" + resources: + requests: + cpu: 100m + memory: 256Mi + limits: + memory: 512Mi diff --git a/k8s/70-guildhall-deployment.yaml b/k8s/70-guildhall-deployment.yaml new file mode 100644 index 0000000..0871608 --- /dev/null +++ b/k8s/70-guildhall-deployment.yaml @@ -0,0 +1,98 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: guildhall + namespace: guildhall + labels: + app.kubernetes.io/name: guildhall + app.kubernetes.io/part-of: guildhouse + app.kubernetes.io/component: web + app.kubernetes.io/managed-by: manual + app.kubernetes.io/version: v0.1.0 +spec: + replicas: 1 + strategy: + type: RollingUpdate + rollingUpdate: + maxSurge: 1 + maxUnavailable: 0 + selector: + matchLabels: + app: guildhall + template: + metadata: + labels: + app: guildhall + app.kubernetes.io/name: guildhall + app.kubernetes.io/part-of: guildhouse + app.kubernetes.io/component: web + app.kubernetes.io/version: v0.1.0 + spec: + imagePullSecrets: + - name: guildhall-registry + containers: + - name: guildhall + image: git.guildhouse.dev/tking/guildhall:v0.1.0 + imagePullPolicy: IfNotPresent + ports: + - containerPort: 4000 + name: http + protocol: TCP + env: + # Phoenix / endpoint + - name: PHX_SERVER + value: "true" + - name: PHX_HOST + value: guildhall.guildhouse.dev + - name: PORT + value: "4000" + - name: POOL_SIZE + value: "10" + # Session signing key + - name: SECRET_KEY_BASE + valueFrom: + secretKeyRef: + name: guildhall-app-secrets + key: SECRET_KEY_BASE + # Ecto + - name: DATABASE_URL + valueFrom: + secretKeyRef: + name: guildhall-app-secrets + key: DATABASE_URL + # Starting envelope. Tune after observing real usage under + # LiveView fan-out; Phoenix's memory footprint grows with + # connected clients. + resources: + requests: + cpu: 200m + memory: 256Mi + limits: + cpu: "1" + memory: 1Gi + # Probes hit /health, which queries the Ecto pool. See + # apps/guildhall_web/lib/guildhall_web_web/controllers/health_controller.ex + # for semantics. + readinessProbe: + httpGet: + path: /health + port: http + initialDelaySeconds: 10 + periodSeconds: 5 + timeoutSeconds: 3 + failureThreshold: 3 + livenessProbe: + httpGet: + path: /health + port: http + initialDelaySeconds: 60 + periodSeconds: 30 + timeoutSeconds: 5 + failureThreshold: 3 + # Graceful shutdown allowance. Phoenix endpoint shuts down + # cleanly inside this window. + lifecycle: + preStop: + exec: + command: ["/bin/sh", "-c", "sleep 5"] + terminationGracePeriodSeconds: 30 diff --git a/k8s/80-guildhall-service.yaml b/k8s/80-guildhall-service.yaml new file mode 100644 index 0000000..05cb2b1 --- /dev/null +++ b/k8s/80-guildhall-service.yaml @@ -0,0 +1,32 @@ +apiVersion: v1 +kind: Service +metadata: + name: guildhall + namespace: guildhall + labels: + app.kubernetes.io/name: guildhall + app.kubernetes.io/part-of: guildhouse + app.kubernetes.io/component: web + app.kubernetes.io/managed-by: manual + # Hetzner Cloud Controller Manager annotations. Matches the exact + # annotation set used by keycloak/keycloak (verified from the cluster + # on 2026-04-21): location / name / type / use-private-ip. No + # algorithm-type, no uses-proxyprotocol — the cluster's convention + # is the minimal set. + annotations: + load-balancer.hetzner.cloud/location: nbg1 + load-balancer.hetzner.cloud/name: guildhall + load-balancer.hetzner.cloud/type: lb11 + load-balancer.hetzner.cloud/use-private-ip: "false" +spec: + type: LoadBalancer + # TLS terminates at Cloudflare (orange cloud); origin is plain HTTP + # on port 80 → app's 4000. This matches forgejo/keycloak. Upgrading + # to in-cluster TLS via cert-manager is hygiene follow-up, not v0.1. + ports: + - port: 80 + targetPort: http + protocol: TCP + name: http + selector: + app: guildhall