ADR-004: Forgejo in-cluster Actions runner on railiance01
Decides long-lived runner Deployment with DinD sidecar; updates RAIL-HO-WP-0005 runner model decision.
This commit is contained in:
parent
b32c56db4f
commit
6b0ededee2
2 changed files with 124 additions and 9 deletions
104
docs/adr/ADR-004-forgejo-in-cluster-actions-runner.md
Normal file
104
docs/adr/ADR-004-forgejo-in-cluster-actions-runner.md
Normal file
|
|
@ -0,0 +1,104 @@
|
||||||
|
# ADR-004 — Forgejo In-Cluster Actions Runner on railiance01
|
||||||
|
|
||||||
|
**Status:** Accepted
|
||||||
|
**Date:** 2026-07-03
|
||||||
|
**Deciders:** Bernd Worsch (operator), custodian agents
|
||||||
|
**Workplans:** `RAIL-HO-WP-0005-T02`, `CUST-WP-0054-T04`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
Forgejo production runs on **railiance01 k3s** (`railiance-apps`, S5). An interim
|
||||||
|
**host runner** on coulombcore proved Actions scheduling (`coulomb/forgejo-actions-probe`)
|
||||||
|
but:
|
||||||
|
|
||||||
|
- coulombcore is a legacy machine slated for drain (CUST-WP-0054-T03).
|
||||||
|
- Host runners require Docker or Podman on the OS — not installed, not desired on
|
||||||
|
coulombcore long term.
|
||||||
|
- Forgejo upstream recommends **not** co-locating runners on the same machine as the
|
||||||
|
forge instance; in-cluster **separate pods** satisfy isolation while staying on the
|
||||||
|
production fleet node.
|
||||||
|
- `RAIL-HO-WP-0005-T02` left the runner model undecided among host, in-cluster, and
|
||||||
|
ephemeral options.
|
||||||
|
|
||||||
|
Goal: a **coherent Kubernetes-from-the-start** CI substrate — Forgejo app, database,
|
||||||
|
ingress, and Actions runner all lifecycle-managed on railiance01.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
### Runner placement
|
||||||
|
|
||||||
|
Deploy **one long-lived Forgejo Actions runner Deployment** in the `forgejo` namespace
|
||||||
|
on railiance01:
|
||||||
|
|
||||||
|
| Component | Implementation |
|
||||||
|
| --- | --- |
|
||||||
|
| Runner | `data.forgejo.org/forgejo/runner:6.3.1` |
|
||||||
|
| Container runtime for jobs | `docker:dind` sidecar (privileged) |
|
||||||
|
| State | PVC `forgejo-runner-data` (`.runner`, `config.yaml`, action cache) |
|
||||||
|
| Registration scope | `coulomb` organization |
|
||||||
|
| Runner name | `railiance01-build-01` |
|
||||||
|
| Deploy surface | `railiance-apps/manifests/forgejo-runner.yaml` |
|
||||||
|
| Operator targets | `make forgejo-runner-deploy`, `forgejo-runner-status` |
|
||||||
|
|
||||||
|
### Label contract
|
||||||
|
|
||||||
|
Preserve Gitea migration compatibility and semantic capability labels:
|
||||||
|
|
||||||
|
```text
|
||||||
|
self-hosted:host,linux:host,linux_amd64:host,container-build:host,registry-publish:host,railiance01:host,ubuntu-latest:docker://node:20-bookworm,docker:docker://node:20-bookworm
|
||||||
|
```
|
||||||
|
|
||||||
|
### Security boundaries
|
||||||
|
|
||||||
|
- Runner pod receives **no** cluster-admin kubeconfig and **no** OpenBao tokens by default.
|
||||||
|
- `registry-publish` jobs use **repo/org-scoped Forgejo secrets** only.
|
||||||
|
- DinD sidecar runs **privileged** — accepted for single-node railiance01 with
|
||||||
|
dedicated `forgejo` namespace; revisit when a third node or multi-tenant runners appear.
|
||||||
|
- Registration tokens live in Kubernetes Secret `forgejo-runner-registration` (SOPS
|
||||||
|
template committed; live value never in Git).
|
||||||
|
|
||||||
|
### Retire interim host runner
|
||||||
|
|
||||||
|
Stop and disable `forgejo-runner.service` on coulombcore after in-cluster runner is
|
||||||
|
healthy. Do not register new host runners without an explicit ADR amendment.
|
||||||
|
|
||||||
|
## Alternatives considered
|
||||||
|
|
||||||
|
| Option | Outcome |
|
||||||
|
| --- | --- |
|
||||||
|
| Host runner + Docker on coulombcore | Rejected — legacy host, contradicts drain plan |
|
||||||
|
| Host runner + Podman on haskelseed | Viable fallback; not chosen as primary |
|
||||||
|
| Kaniko/Buildah without DinD | Deferred — higher workflow churn during Gitea migration |
|
||||||
|
| Multiple ephemeral runner Jobs | Deferred — start with capacity=1 long-lived pod |
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Positive**
|
||||||
|
|
||||||
|
- Single-machine production loop: forge + runner on railiance01, workstation not required.
|
||||||
|
- Container image CI (`docker build` / `docker push`) works without OS-level Docker.
|
||||||
|
- Runner upgrades roll with Git-managed manifests and `kubectl`/Makefile.
|
||||||
|
|
||||||
|
**Negative / follow-on**
|
||||||
|
|
||||||
|
- Privileged DinD increases blast radius within the node — monitor and restrict namespace RBAC.
|
||||||
|
- SOPS-encrypted registration secret still requires operator age key.
|
||||||
|
- `cluster-deploy` / `s5-release-check` labels remain **out of scope** until credential paths reviewed.
|
||||||
|
|
||||||
|
## Ownership (OAS)
|
||||||
|
|
||||||
|
| Concern | Repo | Layer |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| ADR + umbrella sequencing | `railiance-infra` | S1 |
|
||||||
|
| Runner manifests + Makefile | `railiance-apps` | S5 |
|
||||||
|
| Label contract + runner evidence docs | `railiance-forge` | S5 forge substrate |
|
||||||
|
| Reusable workflow templates | `railiance-enablement` | S4 |
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- `railiance-apps/docs/forgejo-on-railiance01.md`
|
||||||
|
- `railiance-forge/docs/forgejo-actions-runner-substrate.md`
|
||||||
|
- `the-custodian/docs/forgejo-production-decisions.md`
|
||||||
|
- [Forgejo runner installation](https://forgejo.org/docs/v11.0/admin/actions/runner-installation/)
|
||||||
|
|
@ -46,14 +46,18 @@ change is made there.
|
||||||
|
|
||||||
## Key Decisions to Confirm
|
## Key Decisions to Confirm
|
||||||
|
|
||||||
1. Public/private hostname for Forgejo and whether Gitea remains reachable
|
1. ~~Public/private hostname for Forgejo~~ **DECIDED 2026-07-03:**
|
||||||
during the transition.
|
`forgejo.coulomb.social` → railiance01 (`92.205.62.239`). DNS active;
|
||||||
|
Traefik edge live; Forgejo workload not deployed yet (404). Gitea remains
|
||||||
|
canonical until migration drills pass. Record:
|
||||||
|
`the-custodian/docs/forgejo-production-decisions.md`.
|
||||||
2. Mail delivery path for password reset and account recovery
|
2. Mail delivery path for password reset and account recovery
|
||||||
(SMTP relay, sender domain, SPF/DKIM/DMARC expectations).
|
(SMTP relay, sender domain, SPF/DKIM/DMARC expectations).
|
||||||
3. Package registry scope: container images only at first, or also generic,
|
3. Package registry scope: container images only at first, or also generic,
|
||||||
npm, PyPI, Go, Maven, and Helm packages.
|
npm, PyPI, Go, Maven, and Helm packages.
|
||||||
4. Actions runner model: in-cluster ephemeral runners, long-lived runner pod,
|
4. ~~Actions runner model~~ **DECIDED 2026-07-03:** in-cluster long-lived runner
|
||||||
or isolated host runner.
|
Deployment with DinD sidecar on railiance01 (`ADR-004`). Interim coulombcore
|
||||||
|
host runner retired after cutover.
|
||||||
5. Backup destination and retention target for database, repositories,
|
5. Backup destination and retention target for database, repositories,
|
||||||
attachments, LFS, Actions artifacts/logs, and package data.
|
attachments, LFS, Actions artifacts/logs, and package data.
|
||||||
6. Cutover mode: freeze-and-migrate all repos in one window, or staged
|
6. Cutover mode: freeze-and-migrate all repos in one window, or staged
|
||||||
|
|
@ -98,8 +102,7 @@ The probe is destroyed or explicitly archived after production Forgejo is live.
|
||||||
|
|
||||||
```
|
```
|
||||||
operator / agents / developers
|
operator / agents / developers
|
||||||
-> private HTTPS endpoint
|
-> https://forgejo.coulomb.social (railiance01 Traefik ingress)
|
||||||
-> railiance01 ingress
|
|
||||||
-> forgejo Service in forgejo namespace
|
-> forgejo Service in forgejo namespace
|
||||||
-> Forgejo Deployment/StatefulSet
|
-> Forgejo Deployment/StatefulSet
|
||||||
-> forgejo-db CloudNative PG Cluster in databases namespace
|
-> forgejo-db CloudNative PG Cluster in databases namespace
|
||||||
|
|
@ -144,7 +147,7 @@ manual, unsupported, or explicitly out of scope.
|
||||||
|
|
||||||
```task
|
```task
|
||||||
id: RAIL-HO-WP-0005-T02
|
id: RAIL-HO-WP-0005-T02
|
||||||
status: todo
|
status: progress
|
||||||
priority: high
|
priority: high
|
||||||
needs_human: true
|
needs_human: true
|
||||||
state_hub_task_id: "f88115bf-4f99-49ef-a415-0b23750141b3"
|
state_hub_task_id: "f88115bf-4f99-49ef-a415-0b23750141b3"
|
||||||
|
|
@ -152,10 +155,14 @@ state_hub_task_id: "f88115bf-4f99-49ef-a415-0b23750141b3"
|
||||||
|
|
||||||
Decide the production choices listed in "Key Decisions to Confirm".
|
Decide the production choices listed in "Key Decisions to Confirm".
|
||||||
|
|
||||||
|
**Partial (2026-07-03):** hostname and in-cluster runner model decided (`ADR-004`).
|
||||||
|
Remaining: SMTP, package scope, backup, cutover mode. See
|
||||||
|
`the-custodian/docs/forgejo-production-decisions.md`.
|
||||||
|
|
||||||
Expected output:
|
Expected output:
|
||||||
|
|
||||||
- A short decision record in this workplan or a dedicated ADR.
|
- A short decision record in this workplan or a dedicated ADR.
|
||||||
- Hostname and exposure model.
|
- Hostname and exposure model. ✓ hostname; exposure follows railiance01 Traefik
|
||||||
- SMTP provider and sender identity.
|
- SMTP provider and sender identity.
|
||||||
- Package registry scope.
|
- Package registry scope.
|
||||||
- Actions runner isolation model.
|
- Actions runner isolation model.
|
||||||
|
|
@ -229,7 +236,7 @@ Forgejo app running.
|
||||||
|
|
||||||
```task
|
```task
|
||||||
id: RAIL-HO-WP-0005-T05
|
id: RAIL-HO-WP-0005-T05
|
||||||
status: todo
|
status: progress
|
||||||
priority: high
|
priority: high
|
||||||
state_hub_task_id: "11540ba4-d31c-4f64-836b-c6de69107aa4"
|
state_hub_task_id: "11540ba4-d31c-4f64-836b-c6de69107aa4"
|
||||||
```
|
```
|
||||||
|
|
@ -245,6 +252,10 @@ Minimum scope:
|
||||||
- Health/status targets in the Makefile.
|
- Health/status targets in the Makefile.
|
||||||
- Migration-safe configuration for coexistence with Gitea during the cutover.
|
- Migration-safe configuration for coexistence with Gitea during the cutover.
|
||||||
|
|
||||||
|
**Partial (2026-07-03):** `railiance-apps` deploy live — HTTPS smoke pass, Actions
|
||||||
|
enabled, `coulomb` org + probe workflow success. Remaining: SOPS secrets,
|
||||||
|
SMTP, Docker on runner host for image builds, migration drills.
|
||||||
|
|
||||||
**Done when:** Forgejo runs on railiance01 against production platform
|
**Done when:** Forgejo runs on railiance01 against production platform
|
||||||
services and can serve login, git clone/push, package registry, and admin
|
services and can serve login, git clone/push, package registry, and admin
|
||||||
operations.
|
operations.
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue