diff --git a/docs/forgejo-migration-inventory.md b/docs/forgejo-migration-inventory.md index dd047b4..860a127 100644 --- a/docs/forgejo-migration-inventory.md +++ b/docs/forgejo-migration-inventory.md @@ -204,3 +204,25 @@ lost or left with an untracked remote. This first pass satisfies the public and infrastructure metadata part of T01. T01 should remain open until the authenticated admin inventory and missing repo classification are complete. + +## Addendum (2026-07-04) — migration ladder and new repos + +`RAIL-HO-WP-0005` now uses a **staged per-repo ladder** instead of an isolated +probe namespace (T03 cancelled). Repos to add or re-classify on next inventory +refresh: + +| Repo | On Gitea (2026-06) | On Forgejo (2026-07-04) | Tier | Notes | +| --- | --- | --- | ---: | --- | +| `forgejo-actions-probe` | — | yes | 0 | Disposable runner/OCI probe | +| `glas-harness` | yes (not in table above) | yes (canonical) | 1 | Git+SSH+CI pilot; see `the-custodian/docs/forgejo-repo-migration-pilot-glas-harness.md` | + +**Tier definitions** (for per-repo `migration tier` column in a future refresh): + +| Tier | Criteria | Examples | +| ---: | --- | --- | +| 0 | Disposable integration probes | `forgejo-actions-probe` | +| 1 | Non-production; git+CI only | `glas-harness` | +| 2 | Non-production with container image + registry pull | TBD (`key-cape` candidate) | +| 3 | Production drain wave / sweep registration | `state-hub`, `issue-core`, … | + +Production repos stay on Gitea until tier 0–2 gates and T09 backup drill pass. diff --git a/workplans/RAIL-HO-WP-0005-forgejo-production-migration.md b/workplans/RAIL-HO-WP-0005-forgejo-production-migration.md index be28c70..a341b4d 100644 --- a/workplans/RAIL-HO-WP-0005-forgejo-production-migration.md +++ b/workplans/RAIL-HO-WP-0005-forgejo-production-migration.md @@ -8,7 +8,7 @@ status: active owner: railiance topic_slug: railiance created: "2026-05-03" -updated: "2026-06-04" +updated: "2026-07-04" state_hub_workstream_id: "84e17675-0d15-4268-a8bd-540124d37018" --- @@ -24,6 +24,13 @@ Forgejo will become the heart of Railiance infrastructure. The work must be fully automated, backup-backed, recovery-drilled, and suitable for long-lived operation on railiance01 before any production cutover happens. +**Sequencing update (2026-07-04):** Production Forgejo is live on railiance01 +with Gitea still canonical per the safety contract. Repo cutover proceeds +**staged per-repo** using a migration ladder (disposable probes → non-production +pilots → image-capable pilots → production repos). `state-hub` is last. See +`CUST-WP-0054-T04` and +`the-custodian/docs/forgejo-repo-migration-pilot-glas-harness.md`. + ## Placement in the Railiance Tooling Set This workplan lives in `railiance-infra` because it is the cross-layer @@ -48,7 +55,7 @@ change is made there. 1. ~~Public/private hostname for Forgejo~~ **DECIDED 2026-07-03:** `forgejo.coulomb.social` → railiance01 (`92.205.62.239`). DNS active; - Traefik edge live; Forgejo workload not deployed yet (404). Gitea remains + Traefik edge live; Forgejo workload deployed and serving HTTPS. Gitea remains canonical until migration drills pass. Record: `the-custodian/docs/forgejo-production-decisions.md`. 2. Mail delivery path for password reset and account recovery @@ -60,8 +67,9 @@ change is made there. host runner retired after cutover. 5. Backup destination and retention target for database, repositories, attachments, LFS, Actions artifacts/logs, and package data. -6. Cutover mode: freeze-and-migrate all repos in one window, or staged - project-by-project transition. +6. Cutover mode: ~~freeze-all vs staged~~ **LEANING staged per-repo (2026-07-04)** + based on `glas-harness` pilot; operator confirmation still needed. Freeze-all + remains fallback for final production wave if drift risk is unacceptable. ## Safety Contract @@ -80,23 +88,30 @@ change is made there. repo. No plaintext SMTP passwords, admin tokens, runner tokens, or registry credentials in Git. -## Probe Strategy +## Probe and pilot strategy (revised 2026-07-04) -A `forgejo-railiance-probe` is reasonable and should be treated as a disposable -S5/S4 integration probe, not as the production install. +Original T03 planned a **disposable isolated-namespace probe** before any +production install. That path was **superseded**: production Forgejo deployed on +railiance01 under the safety contract (Gitea remains canonical; no Gitea deletes). -The probe should prove: +Integration evidence now comes from **in-production probes and repo pilots**: -- Helm values and cnpg database wiring converge cleanly. -- Initial admin bootstrap is automated and repeatable. -- SMTP/password reset works end-to-end. -- Package registry endpoints work for the package types Railiance needs first. -- Forgejo Actions can run a minimal workflow and publish a test package. -- Backup and restore works in an isolated namespace. -- Migration from a sample Gitea repo preserves git history, issues, releases, - wiki, LFS or attachments where applicable. +| Tier | Repo | Purpose | Status | +| --- | --- | --- | --- | +| 0 | `coulomb/forgejo-actions-probe` | Runner scheduling, DinD, OCI image-build | **done** | +| 1 | `coulomb/glas-harness` | Non-production git+SSH+CI routing drill | **done** | +| 2 | TBD (small lib with image, e.g. `key-cape`) | Image-build workflow + registry pull on railiance01 | **next** | +| 3 | Production set (`state-hub`, `issue-core`, …) | Canonical remotes, sweep paths, deploy loops | **gated** | -The probe is destroyed or explicitly archived after production Forgejo is live. +Each tier must pass before the next. T03 (isolated probe namespace) is cancelled; +acceptance criteria below are tracked across T05, T07, T08, and T10 instead. + +Still to prove before T11: + +- SMTP/password reset end-to-end (T06). +- Backup and restore in isolated namespace (T09). +- Issues/releases/wiki/LFS per inventory classification (T10 matrix). +- Operator SSH identity on Forgejo beyond interim `forgejo_admin` keys (T02/T10). ## Target Architecture @@ -141,6 +156,10 @@ Minimum inventory: Forgejo before cutover and classifies each migration item as automatic, manual, unsupported, or explicitly out of scope. +**Gap (2026-07-04):** first-pass inventory predates repos created after +2026-06-04 (e.g. `glas-harness`, `forgejo-actions-probe`). Refresh org repo +list and add a **migration tier** column (0–3) per repo before T11. + --- ### T02 — Resolve Forgejo production design decisions @@ -155,8 +174,10 @@ state_hub_task_id: "f88115bf-4f99-49ef-a415-0b23750141b3" Decide the production choices listed in "Key Decisions to Confirm". -**Partial (2026-07-03):** hostname and in-cluster runner model decided (`ADR-004`). -Remaining: SMTP, package scope, backup, cutover mode. See +**Partial (2026-07-04):** hostname, exposure, deployment pattern, live deploy, +and in-cluster runner model decided (`ADR-004`). Cutover mode **leaning** staged +per-repo (glas-harness pilot). Remaining operator decisions: SMTP, package scope +beyond OCI, backup target, final cutover confirmation. See `the-custodian/docs/forgejo-production-decisions.md`. Expected output: @@ -174,36 +195,21 @@ choices. --- -### T03 — Build forgejo-railiance-probe +### T03 — Build forgejo-railiance-probe (isolated namespace) ```task id: RAIL-HO-WP-0005-T03 -status: todo +status: cancel priority: high state_hub_task_id: "b516018a-415e-4a58-8c62-07c14ece9353" ``` -Create a disposable probe environment for Forgejo before touching production. - -Expected repo ownership: - -- `railiance-platform`: probe cnpg database and storage dependencies. -- `railiance-apps`: probe Forgejo Helm values and namespace. -- `railiance-enablement`: probe Actions runner template and workflows. - -Probe acceptance: - -- `make forgejo-probe-deploy` or equivalent converges from a clean cluster - state. -- Admin bootstrap is automated. -- A test user can reset a password via email. -- A test repository can be created, cloned, pushed, and protected. -- A test package can be published and pulled. -- A test Forgejo Actions workflow runs successfully. -- A probe backup restores into an isolated namespace. - -**Done when:** the probe demonstrates the whole lifecycle without manual -cluster surgery. +**Cancelled 2026-07-04:** superseded by production Forgejo on railiance01 (T05) +plus in-production integration probes (`forgejo-actions-probe`, `glas-harness`). +Isolated-namespace probe added latency without reducing risk given the safety +contract (Gitea canonical, no deletes). Remaining T03 acceptance items map to: +T05 (deploy), T06 (mail), T07 (packages), T08 (Actions), T09 (backup restore), +T10 (repo migration drill). --- @@ -227,6 +233,11 @@ Minimum scope: packages, Actions artifacts, and logs. - Restore runbook for database and blob/package data. +**Partial (2026-07-04):** `forgejo-db` CNPG cluster healthy on railiance01 +(`make forgejo-db-status` → Cluster in healthy state). SOPS secret path and +network policies in `railiance-platform`. Remaining: backup/WAL archiving to +approved target, blob/package storage restore drill (feeds T09). + **Done when:** platform dependencies can be deployed and restored without the Forgejo app running. @@ -252,9 +263,11 @@ Minimum scope: - Health/status targets in the Makefile. - Migration-safe configuration for coexistence with Gitea during the cutover. -**Partial (2026-07-03):** `railiance-apps` deploy live — HTTPS smoke pass, Actions -enabled, `coulomb` org + probe workflow success. Remaining: SOPS secrets, -SMTP, Docker on runner host for image builds, migration drills. +**Partial (2026-07-04):** `railiance-apps` deploy live — HTTPS smoke pass, +ingress + TLS, SSH NodePort `30022`, Actions enabled, `coulomb` org, +`railiance01-build-01` runner (ADR-004). Git push/pull via HTTPS and +`forgejo-remote` SSH proven. Remaining: SOPS hardening for all secrets, +SMTP (T06), operator user accounts beyond `forgejo_admin`. **Done when:** Forgejo runs on railiance01 against production platform services and can serve login, git clone/push, package registry, and admin @@ -312,8 +325,13 @@ Acceptance: - Retention and cleanup expectations are documented. - Package data is included in backup and restore drills. -**Done when:** `state-hub` or a probe image can be published to Forgejo and -pulled by railiance01. +**Partial (2026-07-04):** OCI registry live (`/v2/` auth challenge). Probe image +`forgejo.coulomb.social/coulomb/forgejo-actions-probe` built and pushed via +Actions. Remaining: publish and pull a **tier-2 pilot** app image (not yet +`state-hub`); document retention; include packages in backup drill (T09). + +**Done when:** a tier-2 pilot image (or `state-hub` after explicit approval) can +be published to Forgejo and pulled by railiance01 k3s. --- @@ -321,7 +339,7 @@ pulled by railiance01. ```task id: RAIL-HO-WP-0005-T08 -status: todo +status: progress priority: high state_hub_task_id: "f45f98c9-2f02-4224-bbfd-c2e1ec38581e" ``` @@ -337,8 +355,16 @@ Minimum scope: - Secret handling policy for Actions. - Resource limits to avoid repeating previous single-node overload patterns. -**Done when:** a representative repository can run Forgejo Actions and publish -a test artifact without privileged cluster-wide credentials. +**Partial (2026-07-04):** in-cluster runner live (`railiance-apps/manifests/ +forgejo-runner.yaml`, ADR-004). Proven workflows: `forgejo-actions-probe` +(image-build), `glas-harness` (host+container CI smoke). Org secrets +`REGISTRY_USER`/`REGISTRY_TOKEN` set. Documented constraints: host runner is +non-root (static docker-cli, no `apk add`); `actions/checkout@v4` fails — use +`git clone` in job. Remaining: reusable workflow templates in +`railiance-enablement` (S4); resource limits review; no cluster-admin on runner. + +**Done when:** tier-2 pilot repo runs Forgejo Actions end-to-end and publishes +a pullable image without privileged cluster-wide credentials. --- @@ -376,29 +402,38 @@ with repository, package, and user recovery checks passing. --- -### T10 — Drill Gitea to Forgejo migration +### T10 — Drill Gitea to Forgejo migration (staged ladder) ```task id: RAIL-HO-WP-0005-T10 -status: todo +status: progress priority: high state_hub_task_id: "6befde73-00bc-4643-be0b-a7ce7944e75f" ``` -Run a non-production migration drill from Gitea to Forgejo. +Run staged migration drills from Gitea to Forgejo before production repos move. -Minimum checks: +**Tier 1 complete (2026-07-04):** `glas-harness` — git history preserved, +`origin` on Forgejo, `gitea` legacy remote retained, SSH+HTTPS push, CI smoke +green. Result matrix: +`the-custodian/docs/forgejo-repo-migration-pilot-glas-harness.md`. + +Minimum checks (per tier): - Git history and default branches preserved. - Issues, labels, milestones, releases, wiki, and attachments handled per - inventory classification. -- SSH/HTTPS clone and push paths work. -- Existing local remotes can be transformed predictably. -- State Hub registered repo remotes can be updated safely. -- Rollback plan is rehearsed. + inventory classification (N/A for tier-1 git-only repos). +- SSH/HTTPS clone and push paths work (`forgejo-remote` in `~/.ssh/config`). +- Existing local remotes can be transformed predictably (`origin`/`gitea` split). +- State Hub registered repo remotes can be updated safely (deferred for tier-1). +- Rollback plan is rehearsed (Gitea copy unchanged). -**Done when:** a sample migration has a written result matrix and no unknown -critical migration gaps remain. +**Next:** tier-2 repo with container image + `.gitea/workflows` port to +`.forgejo/workflows`. **Not ready:** `state-hub` until hub-core build context +template and sweep `remote_url` playbook exist. + +**Done when:** tiers 0–2 pass with written result matrices and no unknown +critical migration gaps remain for production repos. --- @@ -412,19 +447,21 @@ needs_human: true state_hub_task_id: "b1b66687-ca33-4971-b312-743c8e059c5e" ``` -Execute the production migration only after the probe, backup restore, package -registry, email recovery, and Actions gates pass. +Execute production migration only after T06, T07, T08, T09, and T10 tier 0–2 +gates pass. `state-hub` and other Wave-1 production repos require explicit +operator approval per `CUST-WP-0054` drain sequence. -Cutover sequence: +**Preferred cutover (staged per-repo):** -1. Announce freeze window. -2. Take final Gitea backup and verify it exists. -3. Freeze Gitea writes. -4. Migrate repositories and metadata to Forgejo. -5. Validate critical repositories and package pulls. -6. Update State Hub repo remotes and host paths as needed. -7. Update local and railiance01 remotes. -8. Keep Gitea read-only as rollback until the stabilization window passes. +1. Per repo: Gitea backup snapshot (or org-wide before each wave). +2. Mirror git to Forgejo; switch workstation `origin` to `forgejo-remote`. +3. Port/verify Actions workflows on Forgejo runner. +4. Update State Hub `remote_url` and railiance01 sweep checkouts when promoted. +5. Mark Gitea repo read-only (org policy); do not delete. +6. Repeat until production set complete. + +**Freeze-all fallback:** single window if staged drift is unacceptable — same +steps but all repos in one maintenance period. **Done when:** all Railiance/Custodian repos use Forgejo as primary, Gitea is read-only fallback, and rollback instructions are documented. @@ -458,19 +495,28 @@ legacy Gitea either archived or intentionally retained as documented fallback. ## Phasing and Dependencies ``` -T01 inventory ─┬─► T02 decisions ─┬─► T03 probe ─┬─► T04 platform - │ │ ├─► T05 app - │ │ ├─► T06 mail recovery - │ │ ├─► T07 packages - │ │ ├─► T08 actions - │ │ └─► T09 backups - └────────────────────────────────────► T10 migration drill +T01 inventory ──► T02 decisions ──┬──► T04 platform (forgejo-db ✓ partial) + ├──► T05 app (live ✓ partial) + ├──► T06 mail recovery + ├──► T07 packages (OCI probe ✓ partial) + ├──► T08 actions (runner ✓ partial) + └──► T09 backups -T03-T10 all pass ─► T11 production cutover ─► T12 legacy Gitea retirement +T05+T08 ──► T10 migration ladder ──► T11 production cutover ──► T12 Gitea retire + tier0 probe ✓ + tier1 glas-harness ✓ + tier2 image repo (next) + tier3 production (gated) + +T03 isolated probe: CANCELLED (superseded by T05 + in-production pilots) ``` -Recommended first slice: T01, T02, T03. Do not start T11 until T06, T07, T08, -T09, and T10 are complete. +**Current focus (2026-07-04):** T10 tier-2 image pilot; parallel T09 backup +drill and T02 open decisions (SMTP, backup target). Do not start T11 +`state-hub` until T09 complete and `CUST-WP-0054` Wave-1 gates satisfied. + +**Absorbed by `CUST-WP-0054-T04`:** forge + CI on railiance01; workstation +build retirement; staged repo promotion before State Hub primary move (T05). ## railiance-bootstrap Note @@ -490,7 +536,14 @@ purpose is identified. - `RAIL-HO-WP-0004-production-readiness.md` - `RAIL-HO-WP-0003-5repo-stack-restructure.md` +- `CUST-WP-0054-workstation-independence-and-fleet-realignment.md` (T04 forge+CI) - `CUST-WP-0014-repo-sync-automation.md` - `CUST-WP-0021-multi-host-repo-paths.md` +- `docs/adr/ADR-004-forgejo-in-cluster-actions-runner.md` +- `docs/forgejo-migration-inventory.md` +- `the-custodian/docs/forgejo-production-decisions.md` +- `the-custodian/docs/forgejo-repo-migration-pilot-glas-harness.md` +- `railiance-apps/docs/forgejo-on-railiance01.md` +- `railiance-forge/docs/forgejo-actions-runner-substrate.md` - `ops/incidents/2026-03-25-gitea-pgpool-crashloop.md` - `ops/incidents/2026-03-26-coulombcore-runaway-agent-overload.md`