RAIL-HO-WP-0005-T09: Forgejo backup/restore drill assets and evidence

Add isolated-namespace restore drill (CNPG cluster, PVC, orchestration script)
and document successful 2026-07-04 run: production forgejo dump restored with
health 200 and pilot repos visible via API. Scheduled backups remain open.
This commit is contained in:
tegwick 2026-07-04 11:26:50 +02:00
parent 2d62317ada
commit 092315895f
5 changed files with 257 additions and 5 deletions

View file

@ -0,0 +1,93 @@
# Forgejo Backup/Restore Drill Evidence
Date: 2026-07-04
Workplan: RAIL-HO-WP-0005
Task: RAIL-HO-WP-0005-T09
`no_secret_material_recorded: true`
## Purpose
Prove that a production `forgejo dump` can be restored into an isolated
namespace and serve repository metadata without touching production Forgejo or
Gitea.
## Backup source
| Field | Value |
| --- | --- |
| Method | `forgejo dump` from production pod |
| Production pod | `forgejo-gitea-64c5b57684-ph9vt` (namespace `forgejo`) |
| Archive path (workstation) | `/tmp/forgejo-drill/forgejo-drill-backup.zip` |
| Archive size | 12,284,847 bytes (~11.7 MiB) |
| Archive timestamp | 2026-07-04 11:20 +0200 |
| Archive contents (top-level) | `repos/`, `data/`, `forgejo-db.sql`, `app.ini` |
Repos present in dump: `forgejo-actions-probe`, `glas-harness`, `key-cape`
(all under `repos/coulomb/`).
## Restore target
| Field | Value |
| --- | --- |
| Namespace | `forgejo-restore-drill` |
| Database | CNPG cluster `forgejo-db-restore` (isolated, 1 instance) |
| App data PVC | `forgejo-restore-data` (`local-path`, 10Gi) |
| Helm release | `forgejo-restore` (`gitea-charts/gitea` 12.5.0) |
| Orchestration | `tools/forgejo-restore-drill.sh` |
Restore path (Forgejo 11.0.3 has no `forgejo restore` CLI):
1. Unzip dump into import pod staging area.
2. Copy `repos/``/data/git/gitea-repositories/`.
3. Copy `data/``/data/` (packages, attachments, avatars).
4. Import `forgejo-db.sql` via `psql` into `forgejo-db-restore`.
5. Deploy isolated Helm release bound to restored PVC + restore DB host.
## Post-restore checks (2026-07-04)
Port-forward: `svc/forgejo-restore-gitea-http``127.0.0.1:13000`
| Check | Result |
| --- | --- |
| `GET /` health | HTTP 200 |
| `GET /api/v1/repos/coulomb/glas-harness` | `full_name=coulomb/glas-harness`, `default_branch=main` |
| `GET /api/v1/repos/coulomb/key-cape` | `full_name=coulomb/key-cape`, `default_branch=main` |
| `GET /api/v1/orgs/coulomb/repos` | 3 repos: `forgejo-actions-probe`, `glas-harness`, `key-cape` |
Script exit marker: `restore-drill-complete`
## RPO / RTO (drill scope)
| Metric | Observed / assumed |
| --- | --- |
| RPO (manual dump) | Point-in-time of `forgejo dump` execution; no scheduled backup yet |
| RTO (isolated restore) | ~35 minutes for CNPG ready + import + Helm deploy on railiance01 |
| Production impact | None — read-only dump from running pod; separate namespace |
## Gaps (not closed by this drill)
- **Scheduled backups:** CNPG `Backup` CRs and off-cluster target not configured
(`kubectl cnpg` plugin absent on workstation).
- **Encryption at rest:** dump stored locally on workstation for drill only; no
approved backup target wired.
- **Automation:** `forgejo dump` is manual; T04/T09 still need cron/operator
schedule and retention policy (T02 decision).
- **Re-run hygiene:** concurrent or repeat runs require `DRILL_CLEAN=1` to wipe
`forgejo-restore-drill` before import (SQL import is not idempotent).
## Cleanup
After evidence capture, delete the drill namespace:
```bash
kubectl delete namespace forgejo-restore-drill --wait=true
```
Production Forgejo (`forgejo` namespace) and Gitea remain unchanged.
## References
- `infra/forgejo-restore-drill/forgejo-db-restore-cluster.yaml`
- `infra/forgejo-restore-drill/restore-job.yaml`
- `tools/forgejo-restore-drill.sh`
- `workplans/RAIL-HO-WP-0005-forgejo-production-migration.md` (T09)