guildhall/FORGEJO-REGISTRY-INVESTIGATION-2026-04-21.md
Tyler J King 115bd178a2 docs(deploy): capture exploratory reports for Talos + Forgejo registry
DEPLOY-EXPLORATORY documents the cluster state that shaped deployment
decisions (Keycloak as template, Hetzner LB + Cloudflare pattern, no
Postgres operator so sibling-Deployment pattern).

FORGEJO-REGISTRY-INVESTIGATION documents that the registry was already
operational in Forgejo 9.0.3 (packages enabled by default) and the
storage/credential path forward.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Tyler J King <tking@guildhouse.dev>
2026-04-22 09:01:20 -04:00

222 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Forgejo container registry — pre-enablement investigation
**Date:** 2026-04-21
**Scope:** Read-only audit of Forgejo's running state + registry configuration to determine what enablement work (if any) is needed before Guildhall's image push.
**Method:** `kubectl` + `curl` against `https://git.guildhouse.dev`. No mutations.
**Headline:** **The container registry is already enabled.** `/v2/` returns a standard OCI 401, storage headroom is ample (19.4 GB free on 20 GB PVC), and no Forgejo config change is required. Enablement work collapses to credential setup + `docker push`. Estimated time to operational registry for Guildhall: **~30 minutes.**
---
## 1. Forgejo deployment details
| | |
|---|---|
| Namespace | `forgejo` |
| Workload | `Deployment/forgejo` (1 replica, Running) |
| Image | `codeberg.org/forgejo/forgejo:9` |
| Running version | **9.0.3 (Gitea 1.22.0 base)** — confirmed via `GET /api/v1/version` |
| Scheduled node | `gsh-cp-01` (control-plane node, workloads permitted) |
| Companion | `Deployment/forgejo-postgres` (`postgres:16`, 1/1 Running) |
| Init container | `init-config` (renders `/data/gitea/conf/app.ini` from ConfigMap) |
| Runner | `Deployment/forgejo-runner` (0/1 — scaled to zero, source of the Flux health-check warning) |
**Volume mounts on the forgejo container:** one PVC, `data: /data` (the root Forgejo data path; Forgejo 9.x uses `/data` internally, not `/var/lib/gitea` as older Gitea installs did).
**PVCs in the namespace:**
| PVC | Size | StorageClass | Mount |
|---|---|---|---|
| `forgejo-data` | 20 Gi | longhorn | `/data` on forgejo |
| `forgejo-db` | 10 Gi | longhorn | Postgres data |
| `runner-cache` | 5 Gi | longhorn | forgejo-runner (scaled to zero) |
## 2. Forgejo version and config state
### Version
`GET https://git.guildhouse.dev/api/v1/version``{"version":"9.0.3+gitea-1.22.0"}`
Forgejo 9.0.3 is a recent release. Container registry / OCI Distribution API support has been GA in Forgejo since the project forked from Gitea (Gitea 1.17+); this version fully supports the container package type.
### Configuration
`forgejo-config` ConfigMap contains the full `app.ini` (40 lines, managed by Flux at path `./k8s/forgejo` in the `guildhouse-deploy-talos-mirror` source repo). Notable sections:
- `[server]``DOMAIN=git.guildhouse.dev`, `ROOT_URL=https://git.guildhouse.dev/`, `HTTP_PORT=3000`, `SSH_PORT=22`, `SSH_LISTEN_PORT=2222`, `LFS_START_SERVER=true`
- `[service]``DISABLE_REGISTRATION=true` (invite-only signup)
- `[lfs]``STORAGE_TYPE=local`
- `[repository]`, `[actions]` — with an `ENABLED = true` that belongs to Actions, not Packages
- **No explicit `[packages]` section.** This is normal for Forgejo 9.x because packages (including container registry) are enabled by default without requiring config-level opt-in.
### Verification that container registry is live
The decisive probe is the OCI Distribution API endpoint root:
```
$ curl -sS -w '%{http_code}\n' https://git.guildhouse.dev/v2/
{"errors":[{"code":"UNAUTHORIZED","message":""}]}
401
```
This is **a standards-compliant OCI registry response** to an unauthenticated request. If the registry were disabled, Forgejo would serve 404 (the endpoint would not be registered). The 401 with a well-formed `errors` object means the registry is routing correctly and simply requires authentication — the default and correct behavior.
Equivalent probe against `/v2/_catalog` returns the same 401 shape.
API-layer probe `GET /api/v1/packages/tking` also returns 401 (`token is required`), consistent with packages being enabled but requiring auth.
### Storage backend
No overridden `[packages.storage]` in app.ini, which means packages use the default local filesystem path under the Forgejo data volume: `/data/gitea/packages/` (or similar Forgejo 9.x path). This lives on `forgejo-data` (the Longhorn 20 Gi PVC), same volume as git repositories, LFS objects, and Forgejo's own state.
## 3. How Forgejo is managed
Forgejo is managed by **Flux**. A `Kustomization` `flux-system/forgejo` reconciles the manifests from:
- **Source:** `GitRepository/flux-system/guildhouse-deploy`
- **URL:** `https://github.com/gh-tking/guildhouse-deploy-talos-mirror`
- **Branch:** `main`
- **Path:** `./k8s/forgejo`
- **Current revision:** `main@169e077f`
- **Interval:** 1 minute
**Kustomization inventory** (what Flux claims to own in this path):
```
_forgejo__Namespace
forgejo_forgejo-config__ConfigMap ← this is where app.ini lives
forgejo_runner-config__ConfigMap
forgejo_forgejo-secrets__Secret
forgejo_forgejo-http__Service
forgejo_forgejo-postgres__Service
forgejo_forgejo_apps_Deployment
forgejo_forgejo-postgres_apps_Deployment
forgejo_forgejo-runner_apps_Deployment
forgejo_forgejo-data__PersistentVolumeClaim
forgejo_forgejo-db__PersistentVolumeClaim
```
**Status:** `Ready: False` / `Healthy: False` because of a health-check timeout on `forgejo-runner` — but this is a scaled-to-zero sidecar Deployment, not a problem with core Forgejo. The core Forgejo Deployment is Ready, the registry is live, and the Kustomization IS reconciling successfully against new commits — the health condition is just stuck on the runner.
**Consequence:** if we ever needed to change Forgejo's `app.ini` (we don't, for registry work), the mechanism is to edit `k8s/forgejo/forgejo-config.yaml` in the `gh-tking/guildhouse-deploy-talos-mirror` GitHub repo, push to `main`, and wait for Flux to reconcile (1-minute interval). This path is functional today despite the runner health warning.
## 4. The cluster-infra Flux error
`kubectl describe kustomization cluster-infra -n flux-system`:
- **Suspend: true** (explicitly suspended by an operator earlier)
- **Source:** `guildhouse-deploy` GitRepository, path `./talos/manifests/cluster-infra`
- **Error message:**
```
failed to decode Kubernetes YAML from /tmp/kustomization-.../talos/manifests/cluster-infra/
10-cilium-values.yaml: missing Resource metadata <nil>
```
**Diagnosis:** `10-cilium-values.yaml` is a Helm values file being handed to kustomize-controller as if it were a raw Kubernetes manifest. The file doesn't have a `kind` or `metadata` — it's a values document intended to be consumed by `helm install --values`, not a standalone Kubernetes resource. Kustomize chokes because every file in a Kustomization source path is expected to be Resource-shaped.
**Fix severity:** trivial. One of:
- Move `10-cilium-values.yaml` into a `values/` subdirectory that isn't referenced by `kustomization.yaml`
- Rename the file so it doesn't get picked up (e.g., `10-cilium-values.yaml.hold`)
- Add a `kustomization.yaml` with explicit `resources:` that excludes it
- Replace the file with a proper `HelmRelease` CR that references the values externally
Any of these is a single-file source edit, Flux reconciles on next push.
**Time estimate:** ~3060 minutes including the commit+push+reconcile+verify cycle. The main complication is that `cluster-infra` has `Suspend: true` — whoever suspended it did so deliberately (likely because the error was cascading to blocked downstream Kustomizations). Un-suspending should probably wait until the underlying YAML is fixed, otherwise the same error re-appears.
**Crucially: this error does NOT block Forgejo registry work or Guildhall deployment.** The two Kustomizations are independent. Guildhall deployment can proceed entirely outside the Flux chain (direct `kubectl apply` or a new Guildhall-specific Kustomization once registry+deploy are working). The cluster-infra/spire/quartermaster/bascule chain is substrate-foundation work that's explicitly follow-up.
## 5. Cluster image pull pattern
**No existing pattern for private-registry pulls.** The entire cluster currently pulls only from public registries:
- `quay.io/keycloak/keycloak:26.0`
- `codeberg.org/forgejo/forgejo:9`
- `postgres:16` (Docker Hub)
- `quay.io/cilium/cilium:v1.16.5` and `quay.io/cilium/cilium-envoy`
- Longhorn and Flux images (all public)
Specifically:
```
$ kubectl get secrets -A --field-selector type=kubernetes.io/dockerconfigjson
No resources found
```
Zero `dockerconfigjson` secrets cluster-wide. Zero `imagePullSecrets` referenced on any Deployment.
**Guildhall will be the first workload pulling from a private Forgejo registry.** It introduces the pattern, which then becomes the template for subsequent workloads. Two options:
1. **Make the `tking/guildhall` Forgejo package public.** Forgejo packages can be scoped public or private; a public container package allows anonymous pulls and no pull secret is needed. This matches the rest of the cluster's zero-pull-secret state. Appropriate if there's nothing sensitive in the image itself.
2. **Keep the package private and add a `dockerconfigjson` Secret.** Standard pattern: `kubectl create secret docker-registry guildhall-registry --docker-server=git.guildhouse.dev --docker-username=<user> --docker-password=<token>`, then reference in the Deployment via `imagePullSecrets: [name: guildhall-registry]`.
Option 1 is simplest for v0.1. Option 2 is better hygiene long-term.
## 6. Storage headroom on Forgejo's volume
`kubectl exec -n forgejo deployment/forgejo -- df -h` (inside the forgejo container):
```
/dev/longhorn/pvc-683ec33a-... 19.5G 137.2M 19.4G 1% /data
```
**Headroom is ample.** 19.4 GB free on a 20 GB PVC. Current Forgejo usage after 10 days is 137 MB (git repos + LFS + internal state).
A Guildhall container image — Elixir release on debian-slim, typically 100-300 MB compressed per tag, with OCI layer deduplication across tags — would add maybe 1-3 GB of package storage over dozens of iterations. No pressure on the volume for the foreseeable future.
**No resize required.** If long-term registry growth becomes an issue (multiple applications all pushing many tags, or large binary releases), Longhorn supports online expansion of the PVC — but that's a much-later concern.
---
## Synthesis
### Is the registry already enabled?
**Yes.** The `/v2/` and `/v2/_catalog` endpoints return proper OCI Distribution API responses (401 unauthenticated with well-formed `errors` objects). Forgejo 9.x enables packages by default; no `[packages]` config section is needed, and none is present. The registry is live and waiting for an authenticated client.
### What enablement work is required?
**None at the Forgejo-config layer.** The only work is client-side:
1. **Create a Forgejo Personal Access Token** (scope: `package:write`) via the Forgejo UI at `https://git.guildhouse.dev/-/user/settings/applications`
2. **Docker login from the build machine:** `docker login git.guildhouse.dev -u tking -p <PAT>`
3. **Build + push** the Guildhall image: `docker build -t git.guildhouse.dev/tking/guildhall:v0.1.0 . && docker push …`
4. **Set package visibility** in Forgejo — public (anon-pull, no imagePullSecret needed) or private (create a `dockerconfigjson` Secret in the `guildhall` namespace, reference in Deployment)
No Flux source edits. No Kustomization changes. No ConfigMap changes. No `cluster-infra` unblock required.
### Is the `cluster-infra` Flux error a blocker?
**No.** The Forgejo registry operates entirely outside the cluster-infra / spire / quartermaster / bascule Flux chain. Forgejo is managed by its own independent Kustomization (`flux-system/forgejo`), which is successfully reconciling against source revisions even though its Ready condition is flagged False by the unrelated forgejo-runner health check.
The `cluster-infra` error is real and worth fixing separately (trivial single-file fix in the GitHub source repo) but it has zero coupling to registry enablement or Guildhall deployment. Treat as a cleanup backlog item, not a pre-req.
### Estimated time to registry operational
| Step | Time |
|---|---|
| Create Forgejo PAT (Forgejo UI) | 2 min |
| `docker login git.guildhouse.dev` | <1 min |
| Dockerfile + `mix release` setup in Guildhall repo | 15-20 min (real work) |
| `docker build` (cold build for Elixir + OTP + mix deps + assets) | 5-10 min |
| `docker push` | 1-3 min (single tag, ~200 MB compressed) |
| Set package visibility (public or private + pull secret) | 2-5 min |
| **Total to first successful image in the registry** | **~30-45 min** |
Most of the time is the Dockerfile + release-build setup, not the registry interaction itself.
### Recommended next step
**Build the Guildhall Dockerfile and push a first image.** Sequencing:
1. Author `Dockerfile` in `~/projects/substrate-project/guildhall/` multi-stage (Elixir 1.17.3/OTP 27 builder debian-slim runtime, `mix release`, non-root user, expose 4000, healthcheck endpoint)
2. Author `.dockerignore` that excludes `_build/`, `deps/`, `.git/`, `priv/static/` (if built separately) matches Phoenix release conventions
3. Create Forgejo PAT with `package:write` scope
4. `docker login git.guildhouse.dev` from the desktop
5. `docker build -t git.guildhouse.dev/tking/guildhall:v0.1.0 .`
6. `docker push git.guildhouse.dev/tking/guildhall:v0.1.0`
7. Verify via Forgejo UI at `https://git.guildhouse.dev/tking/-/packages/container/guildhall` and via `curl` to `/v2/tking/guildhall/manifests/v0.1.0` (authenticated)
8. Decide package visibility, and if private, create `guildhall-registry` Secret in the `guildhall` namespace (namespace doesn't exist yet create at deploy time)
The Kubernetes-side deploy (Deployment + Service + Postgres + PVC + Secret) proceeds in parallel with or immediately after the image build, following the Keycloak pattern captured in the earlier `DEPLOY-EXPLORATORY-2026-04-21.md`.
No pre-work needed on Forgejo itself. The registry is ready.