guildhall/FORGEJO-REGISTRY-INVESTIGATION-2026-04-21.md
Tyler J King 115bd178a2 docs(deploy): capture exploratory reports for Talos + Forgejo registry
DEPLOY-EXPLORATORY documents the cluster state that shaped deployment
decisions (Keycloak as template, Hetzner LB + Cloudflare pattern, no
Postgres operator so sibling-Deployment pattern).

FORGEJO-REGISTRY-INVESTIGATION documents that the registry was already
operational in Forgejo 9.0.3 (packages enabled by default) and the
storage/credential path forward.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Tyler J King <tking@guildhouse.dev>
2026-04-22 09:01:20 -04:00

13 KiB
Raw Blame History

Forgejo container registry — pre-enablement investigation

Date: 2026-04-21 Scope: Read-only audit of Forgejo's running state + registry configuration to determine what enablement work (if any) is needed before Guildhall's image push. Method: kubectl + curl against https://git.guildhouse.dev. No mutations. Headline: The container registry is already enabled. /v2/ returns a standard OCI 401, storage headroom is ample (19.4 GB free on 20 GB PVC), and no Forgejo config change is required. Enablement work collapses to credential setup + docker push. Estimated time to operational registry for Guildhall: ~30 minutes.


1. Forgejo deployment details

Namespace forgejo
Workload Deployment/forgejo (1 replica, Running)
Image codeberg.org/forgejo/forgejo:9
Running version 9.0.3 (Gitea 1.22.0 base) — confirmed via GET /api/v1/version
Scheduled node gsh-cp-01 (control-plane node, workloads permitted)
Companion Deployment/forgejo-postgres (postgres:16, 1/1 Running)
Init container init-config (renders /data/gitea/conf/app.ini from ConfigMap)
Runner Deployment/forgejo-runner (0/1 — scaled to zero, source of the Flux health-check warning)

Volume mounts on the forgejo container: one PVC, data: /data (the root Forgejo data path; Forgejo 9.x uses /data internally, not /var/lib/gitea as older Gitea installs did).

PVCs in the namespace:

PVC Size StorageClass Mount
forgejo-data 20 Gi longhorn /data on forgejo
forgejo-db 10 Gi longhorn Postgres data
runner-cache 5 Gi longhorn forgejo-runner (scaled to zero)

2. Forgejo version and config state

Version

GET https://git.guildhouse.dev/api/v1/version{"version":"9.0.3+gitea-1.22.0"}

Forgejo 9.0.3 is a recent release. Container registry / OCI Distribution API support has been GA in Forgejo since the project forked from Gitea (Gitea 1.17+); this version fully supports the container package type.

Configuration

forgejo-config ConfigMap contains the full app.ini (40 lines, managed by Flux at path ./k8s/forgejo in the guildhouse-deploy-talos-mirror source repo). Notable sections:

  • [server]DOMAIN=git.guildhouse.dev, ROOT_URL=https://git.guildhouse.dev/, HTTP_PORT=3000, SSH_PORT=22, SSH_LISTEN_PORT=2222, LFS_START_SERVER=true
  • [service]DISABLE_REGISTRATION=true (invite-only signup)
  • [lfs]STORAGE_TYPE=local
  • [repository], [actions] — with an ENABLED = true that belongs to Actions, not Packages
  • No explicit [packages] section. This is normal for Forgejo 9.x because packages (including container registry) are enabled by default without requiring config-level opt-in.

Verification that container registry is live

The decisive probe is the OCI Distribution API endpoint root:

$ curl -sS -w '%{http_code}\n' https://git.guildhouse.dev/v2/
{"errors":[{"code":"UNAUTHORIZED","message":""}]}
401

This is a standards-compliant OCI registry response to an unauthenticated request. If the registry were disabled, Forgejo would serve 404 (the endpoint would not be registered). The 401 with a well-formed errors object means the registry is routing correctly and simply requires authentication — the default and correct behavior.

Equivalent probe against /v2/_catalog returns the same 401 shape.

API-layer probe GET /api/v1/packages/tking also returns 401 (token is required), consistent with packages being enabled but requiring auth.

Storage backend

No overridden [packages.storage] in app.ini, which means packages use the default local filesystem path under the Forgejo data volume: /data/gitea/packages/ (or similar Forgejo 9.x path). This lives on forgejo-data (the Longhorn 20 Gi PVC), same volume as git repositories, LFS objects, and Forgejo's own state.

3. How Forgejo is managed

Forgejo is managed by Flux. A Kustomization flux-system/forgejo reconciles the manifests from:

  • Source: GitRepository/flux-system/guildhouse-deploy
  • URL: https://github.com/gh-tking/guildhouse-deploy-talos-mirror
  • Branch: main
  • Path: ./k8s/forgejo
  • Current revision: main@169e077f
  • Interval: 1 minute

Kustomization inventory (what Flux claims to own in this path):

_forgejo__Namespace
forgejo_forgejo-config__ConfigMap        ← this is where app.ini lives
forgejo_runner-config__ConfigMap
forgejo_forgejo-secrets__Secret
forgejo_forgejo-http__Service
forgejo_forgejo-postgres__Service
forgejo_forgejo_apps_Deployment
forgejo_forgejo-postgres_apps_Deployment
forgejo_forgejo-runner_apps_Deployment
forgejo_forgejo-data__PersistentVolumeClaim
forgejo_forgejo-db__PersistentVolumeClaim

Status: Ready: False / Healthy: False because of a health-check timeout on forgejo-runner — but this is a scaled-to-zero sidecar Deployment, not a problem with core Forgejo. The core Forgejo Deployment is Ready, the registry is live, and the Kustomization IS reconciling successfully against new commits — the health condition is just stuck on the runner.

Consequence: if we ever needed to change Forgejo's app.ini (we don't, for registry work), the mechanism is to edit k8s/forgejo/forgejo-config.yaml in the gh-tking/guildhouse-deploy-talos-mirror GitHub repo, push to main, and wait for Flux to reconcile (1-minute interval). This path is functional today despite the runner health warning.

4. The cluster-infra Flux error

kubectl describe kustomization cluster-infra -n flux-system:

  • Suspend: true (explicitly suspended by an operator earlier)
  • Source: guildhouse-deploy GitRepository, path ./talos/manifests/cluster-infra
  • Error message:
failed to decode Kubernetes YAML from /tmp/kustomization-.../talos/manifests/cluster-infra/
10-cilium-values.yaml: missing Resource metadata <nil>

Diagnosis: 10-cilium-values.yaml is a Helm values file being handed to kustomize-controller as if it were a raw Kubernetes manifest. The file doesn't have a kind or metadata — it's a values document intended to be consumed by helm install --values, not a standalone Kubernetes resource. Kustomize chokes because every file in a Kustomization source path is expected to be Resource-shaped.

Fix severity: trivial. One of:

  • Move 10-cilium-values.yaml into a values/ subdirectory that isn't referenced by kustomization.yaml
  • Rename the file so it doesn't get picked up (e.g., 10-cilium-values.yaml.hold)
  • Add a kustomization.yaml with explicit resources: that excludes it
  • Replace the file with a proper HelmRelease CR that references the values externally

Any of these is a single-file source edit, Flux reconciles on next push.

Time estimate: ~3060 minutes including the commit+push+reconcile+verify cycle. The main complication is that cluster-infra has Suspend: true — whoever suspended it did so deliberately (likely because the error was cascading to blocked downstream Kustomizations). Un-suspending should probably wait until the underlying YAML is fixed, otherwise the same error re-appears.

Crucially: this error does NOT block Forgejo registry work or Guildhall deployment. The two Kustomizations are independent. Guildhall deployment can proceed entirely outside the Flux chain (direct kubectl apply or a new Guildhall-specific Kustomization once registry+deploy are working). The cluster-infra/spire/quartermaster/bascule chain is substrate-foundation work that's explicitly follow-up.

5. Cluster image pull pattern

No existing pattern for private-registry pulls. The entire cluster currently pulls only from public registries:

  • quay.io/keycloak/keycloak:26.0
  • codeberg.org/forgejo/forgejo:9
  • postgres:16 (Docker Hub)
  • quay.io/cilium/cilium:v1.16.5 and quay.io/cilium/cilium-envoy
  • Longhorn and Flux images (all public)

Specifically:

$ kubectl get secrets -A --field-selector type=kubernetes.io/dockerconfigjson
No resources found

Zero dockerconfigjson secrets cluster-wide. Zero imagePullSecrets referenced on any Deployment.

Guildhall will be the first workload pulling from a private Forgejo registry. It introduces the pattern, which then becomes the template for subsequent workloads. Two options:

  1. Make the tking/guildhall Forgejo package public. Forgejo packages can be scoped public or private; a public container package allows anonymous pulls and no pull secret is needed. This matches the rest of the cluster's zero-pull-secret state. Appropriate if there's nothing sensitive in the image itself.
  2. Keep the package private and add a dockerconfigjson Secret. Standard pattern: kubectl create secret docker-registry guildhall-registry --docker-server=git.guildhouse.dev --docker-username=<user> --docker-password=<token>, then reference in the Deployment via imagePullSecrets: [name: guildhall-registry].

Option 1 is simplest for v0.1. Option 2 is better hygiene long-term.

6. Storage headroom on Forgejo's volume

kubectl exec -n forgejo deployment/forgejo -- df -h (inside the forgejo container):

/dev/longhorn/pvc-683ec33a-...    19.5G    137.2M    19.4G   1%   /data

Headroom is ample. 19.4 GB free on a 20 GB PVC. Current Forgejo usage after 10 days is 137 MB (git repos + LFS + internal state).

A Guildhall container image — Elixir release on debian-slim, typically 100-300 MB compressed per tag, with OCI layer deduplication across tags — would add maybe 1-3 GB of package storage over dozens of iterations. No pressure on the volume for the foreseeable future.

No resize required. If long-term registry growth becomes an issue (multiple applications all pushing many tags, or large binary releases), Longhorn supports online expansion of the PVC — but that's a much-later concern.


Synthesis

Is the registry already enabled?

Yes. The /v2/ and /v2/_catalog endpoints return proper OCI Distribution API responses (401 unauthenticated with well-formed errors objects). Forgejo 9.x enables packages by default; no [packages] config section is needed, and none is present. The registry is live and waiting for an authenticated client.

What enablement work is required?

None at the Forgejo-config layer. The only work is client-side:

  1. Create a Forgejo Personal Access Token (scope: package:write) via the Forgejo UI at https://git.guildhouse.dev/-/user/settings/applications
  2. Docker login from the build machine: docker login git.guildhouse.dev -u tking -p <PAT>
  3. Build + push the Guildhall image: docker build -t git.guildhouse.dev/tking/guildhall:v0.1.0 . && docker push …
  4. Set package visibility in Forgejo — public (anon-pull, no imagePullSecret needed) or private (create a dockerconfigjson Secret in the guildhall namespace, reference in Deployment)

No Flux source edits. No Kustomization changes. No ConfigMap changes. No cluster-infra unblock required.

Is the cluster-infra Flux error a blocker?

No. The Forgejo registry operates entirely outside the cluster-infra / spire / quartermaster / bascule Flux chain. Forgejo is managed by its own independent Kustomization (flux-system/forgejo), which is successfully reconciling against source revisions even though its Ready condition is flagged False by the unrelated forgejo-runner health check.

The cluster-infra error is real and worth fixing separately (trivial single-file fix in the GitHub source repo) but it has zero coupling to registry enablement or Guildhall deployment. Treat as a cleanup backlog item, not a pre-req.

Estimated time to registry operational

Step Time
Create Forgejo PAT (Forgejo UI) 2 min
docker login git.guildhouse.dev <1 min
Dockerfile + mix release setup in Guildhall repo 15-20 min (real work)
docker build (cold build for Elixir + OTP + mix deps + assets) 5-10 min
docker push 1-3 min (single tag, ~200 MB compressed)
Set package visibility (public or private + pull secret) 2-5 min
Total to first successful image in the registry ~30-45 min

Most of the time is the Dockerfile + release-build setup, not the registry interaction itself.

Build the Guildhall Dockerfile and push a first image. Sequencing:

  1. Author Dockerfile in ~/projects/substrate-project/guildhall/ — multi-stage (Elixir 1.17.3/OTP 27 builder → debian-slim runtime, mix release, non-root user, expose 4000, healthcheck endpoint)
  2. Author .dockerignore that excludes _build/, deps/, .git/, priv/static/ (if built separately) — matches Phoenix release conventions
  3. Create Forgejo PAT with package:write scope
  4. docker login git.guildhouse.dev from the desktop
  5. docker build -t git.guildhouse.dev/tking/guildhall:v0.1.0 .
  6. docker push git.guildhouse.dev/tking/guildhall:v0.1.0
  7. Verify via Forgejo UI at https://git.guildhouse.dev/tking/-/packages/container/guildhall and via curl to /v2/tking/guildhall/manifests/v0.1.0 (authenticated)
  8. Decide package visibility, and if private, create guildhall-registry Secret in the guildhall namespace (namespace doesn't exist yet — create at deploy time)

The Kubernetes-side deploy (Deployment + Service + Postgres + PVC + Secret) proceeds in parallel with or immediately after the image build, following the Keycloak pattern captured in the earlier DEPLOY-EXPLORATORY-2026-04-21.md.

No pre-work needed on Forgejo itself. The registry is ready.