railiance-infra/docs/adr/ADR-004-forgejo-in-cluster-actions-runner.md

104 lines
4.1 KiB
Markdown
Raw Normal View History

# ADR-004 — Forgejo In-Cluster Actions Runner on railiance01
**Status:** Accepted
**Date:** 2026-07-03
**Deciders:** Bernd Worsch (operator), custodian agents
**Workplans:** `RAIL-HO-WP-0005-T02`, `CUST-WP-0054-T04`
---
## Context
Forgejo production runs on **railiance01 k3s** (`railiance-apps`, S5). An interim
**host runner** on coulombcore proved Actions scheduling (`coulomb/forgejo-actions-probe`)
but:
- coulombcore is a legacy machine slated for drain (CUST-WP-0054-T03).
- Host runners require Docker or Podman on the OS — not installed, not desired on
coulombcore long term.
- Forgejo upstream recommends **not** co-locating runners on the same machine as the
forge instance; in-cluster **separate pods** satisfy isolation while staying on the
production fleet node.
- `RAIL-HO-WP-0005-T02` left the runner model undecided among host, in-cluster, and
ephemeral options.
Goal: a **coherent Kubernetes-from-the-start** CI substrate — Forgejo app, database,
ingress, and Actions runner all lifecycle-managed on railiance01.
## Decision
### Runner placement
Deploy **one long-lived Forgejo Actions runner Deployment** in the `forgejo` namespace
on railiance01:
| Component | Implementation |
| --- | --- |
| Runner | `data.forgejo.org/forgejo/runner:6.3.1` |
| Container runtime for jobs | `docker:dind` sidecar (privileged) |
| State | PVC `forgejo-runner-data` (`.runner`, `config.yaml`, action cache) |
| Registration scope | `coulomb` organization |
| Runner name | `railiance01-build-01` |
| Deploy surface | `railiance-apps/manifests/forgejo-runner.yaml` |
| Operator targets | `make forgejo-runner-deploy`, `forgejo-runner-status` |
### Label contract
Preserve Gitea migration compatibility and semantic capability labels:
```text
self-hosted:host,linux:host,linux_amd64:host,container-build:host,registry-publish:host,railiance01:host,ubuntu-latest:docker://node:20-bookworm,docker:docker://node:20-bookworm
```
### Security boundaries
- Runner pod receives **no** cluster-admin kubeconfig and **no** OpenBao tokens by default.
- `registry-publish` jobs use **repo/org-scoped Forgejo secrets** only.
- DinD sidecar runs **privileged** — accepted for single-node railiance01 with
dedicated `forgejo` namespace; revisit when a third node or multi-tenant runners appear.
- Registration tokens live in Kubernetes Secret `forgejo-runner-registration` (SOPS
template committed; live value never in Git).
### Retire interim host runner
Stop and disable `forgejo-runner.service` on coulombcore after in-cluster runner is
healthy. Do not register new host runners without an explicit ADR amendment.
## Alternatives considered
| Option | Outcome |
| --- | --- |
| Host runner + Docker on coulombcore | Rejected — legacy host, contradicts drain plan |
| Host runner + Podman on haskelseed | Viable fallback; not chosen as primary |
| Kaniko/Buildah without DinD | Deferred — higher workflow churn during Gitea migration |
| Multiple ephemeral runner Jobs | Deferred — start with capacity=1 long-lived pod |
## Consequences
**Positive**
- Single-machine production loop: forge + runner on railiance01, workstation not required.
- Container image CI (`docker build` / `docker push`) works without OS-level Docker.
- Runner upgrades roll with Git-managed manifests and `kubectl`/Makefile.
**Negative / follow-on**
- Privileged DinD increases blast radius within the node — monitor and restrict namespace RBAC.
- SOPS-encrypted registration secret still requires operator age key.
- `cluster-deploy` / `s5-release-check` labels remain **out of scope** until credential paths reviewed.
## Ownership (OAS)
| Concern | Repo | Layer |
| --- | --- | --- |
| ADR + umbrella sequencing | `railiance-infra` | S1 |
| Runner manifests + Makefile | `railiance-apps` | S5 |
| Label contract + runner evidence docs | `railiance-forge` | S5 forge substrate |
| Reusable workflow templates | `railiance-enablement` | S4 |
## References
- `railiance-apps/docs/forgejo-on-railiance01.md`
- `railiance-forge/docs/forgejo-actions-runner-substrate.md`
- `the-custodian/docs/forgejo-production-decisions.md`
- [Forgejo runner installation](https://forgejo.org/docs/v11.0/admin/actions/runner-installation/)