RAIL-HO-WP-0005-T09: Forgejo backup/restore drill assets and evidence
Add isolated-namespace restore drill (CNPG cluster, PVC, orchestration script) and document successful 2026-07-04 run: production forgejo dump restored with health 200 and pilot repos visible via API. Scheduled backups remain open.
This commit is contained in:
parent
2d62317ada
commit
092315895f
5 changed files with 257 additions and 5 deletions
93
docs/forgejo-restore-drill-evidence.md
Normal file
93
docs/forgejo-restore-drill-evidence.md
Normal file
|
|
@ -0,0 +1,93 @@
|
|||
# Forgejo Backup/Restore Drill Evidence
|
||||
|
||||
Date: 2026-07-04
|
||||
Workplan: RAIL-HO-WP-0005
|
||||
Task: RAIL-HO-WP-0005-T09
|
||||
`no_secret_material_recorded: true`
|
||||
|
||||
## Purpose
|
||||
|
||||
Prove that a production `forgejo dump` can be restored into an isolated
|
||||
namespace and serve repository metadata without touching production Forgejo or
|
||||
Gitea.
|
||||
|
||||
## Backup source
|
||||
|
||||
| Field | Value |
|
||||
| --- | --- |
|
||||
| Method | `forgejo dump` from production pod |
|
||||
| Production pod | `forgejo-gitea-64c5b57684-ph9vt` (namespace `forgejo`) |
|
||||
| Archive path (workstation) | `/tmp/forgejo-drill/forgejo-drill-backup.zip` |
|
||||
| Archive size | 12,284,847 bytes (~11.7 MiB) |
|
||||
| Archive timestamp | 2026-07-04 11:20 +0200 |
|
||||
| Archive contents (top-level) | `repos/`, `data/`, `forgejo-db.sql`, `app.ini` |
|
||||
|
||||
Repos present in dump: `forgejo-actions-probe`, `glas-harness`, `key-cape`
|
||||
(all under `repos/coulomb/`).
|
||||
|
||||
## Restore target
|
||||
|
||||
| Field | Value |
|
||||
| --- | --- |
|
||||
| Namespace | `forgejo-restore-drill` |
|
||||
| Database | CNPG cluster `forgejo-db-restore` (isolated, 1 instance) |
|
||||
| App data PVC | `forgejo-restore-data` (`local-path`, 10Gi) |
|
||||
| Helm release | `forgejo-restore` (`gitea-charts/gitea` 12.5.0) |
|
||||
| Orchestration | `tools/forgejo-restore-drill.sh` |
|
||||
|
||||
Restore path (Forgejo 11.0.3 has no `forgejo restore` CLI):
|
||||
|
||||
1. Unzip dump into import pod staging area.
|
||||
2. Copy `repos/` → `/data/git/gitea-repositories/`.
|
||||
3. Copy `data/` → `/data/` (packages, attachments, avatars).
|
||||
4. Import `forgejo-db.sql` via `psql` into `forgejo-db-restore`.
|
||||
5. Deploy isolated Helm release bound to restored PVC + restore DB host.
|
||||
|
||||
## Post-restore checks (2026-07-04)
|
||||
|
||||
Port-forward: `svc/forgejo-restore-gitea-http` → `127.0.0.1:13000`
|
||||
|
||||
| Check | Result |
|
||||
| --- | --- |
|
||||
| `GET /` health | HTTP 200 |
|
||||
| `GET /api/v1/repos/coulomb/glas-harness` | `full_name=coulomb/glas-harness`, `default_branch=main` |
|
||||
| `GET /api/v1/repos/coulomb/key-cape` | `full_name=coulomb/key-cape`, `default_branch=main` |
|
||||
| `GET /api/v1/orgs/coulomb/repos` | 3 repos: `forgejo-actions-probe`, `glas-harness`, `key-cape` |
|
||||
|
||||
Script exit marker: `restore-drill-complete`
|
||||
|
||||
## RPO / RTO (drill scope)
|
||||
|
||||
| Metric | Observed / assumed |
|
||||
| --- | --- |
|
||||
| RPO (manual dump) | Point-in-time of `forgejo dump` execution; no scheduled backup yet |
|
||||
| RTO (isolated restore) | ~3–5 minutes for CNPG ready + import + Helm deploy on railiance01 |
|
||||
| Production impact | None — read-only dump from running pod; separate namespace |
|
||||
|
||||
## Gaps (not closed by this drill)
|
||||
|
||||
- **Scheduled backups:** CNPG `Backup` CRs and off-cluster target not configured
|
||||
(`kubectl cnpg` plugin absent on workstation).
|
||||
- **Encryption at rest:** dump stored locally on workstation for drill only; no
|
||||
approved backup target wired.
|
||||
- **Automation:** `forgejo dump` is manual; T04/T09 still need cron/operator
|
||||
schedule and retention policy (T02 decision).
|
||||
- **Re-run hygiene:** concurrent or repeat runs require `DRILL_CLEAN=1` to wipe
|
||||
`forgejo-restore-drill` before import (SQL import is not idempotent).
|
||||
|
||||
## Cleanup
|
||||
|
||||
After evidence capture, delete the drill namespace:
|
||||
|
||||
```bash
|
||||
kubectl delete namespace forgejo-restore-drill --wait=true
|
||||
```
|
||||
|
||||
Production Forgejo (`forgejo` namespace) and Gitea remain unchanged.
|
||||
|
||||
## References
|
||||
|
||||
- `infra/forgejo-restore-drill/forgejo-db-restore-cluster.yaml`
|
||||
- `infra/forgejo-restore-drill/restore-job.yaml`
|
||||
- `tools/forgejo-restore-drill.sh`
|
||||
- `workplans/RAIL-HO-WP-0005-forgejo-production-migration.md` (T09)
|
||||
21
infra/forgejo-restore-drill/forgejo-db-restore-cluster.yaml
Normal file
21
infra/forgejo-restore-drill/forgejo-db-restore-cluster.yaml
Normal file
|
|
@ -0,0 +1,21 @@
|
|||
---
|
||||
apiVersion: postgresql.cnpg.io/v1
|
||||
kind: Cluster
|
||||
metadata:
|
||||
name: forgejo-db-restore
|
||||
namespace: forgejo-restore-drill
|
||||
labels:
|
||||
app.kubernetes.io/name: forgejo-db-restore
|
||||
railiance.io/layer: s3-platform
|
||||
railiance.io/consumer: forgejo-restore-drill
|
||||
spec:
|
||||
instances: 1
|
||||
imageName: ghcr.io/cloudnative-pg/postgresql:16
|
||||
storage:
|
||||
size: 10Gi
|
||||
bootstrap:
|
||||
initdb:
|
||||
database: forgejo
|
||||
owner: forgejo
|
||||
secret:
|
||||
name: forgejo-db-credentials
|
||||
12
infra/forgejo-restore-drill/restore-job.yaml
Normal file
12
infra/forgejo-restore-drill/restore-job.yaml
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: forgejo-restore-data
|
||||
namespace: forgejo-restore-drill
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
storageClassName: local-path
|
||||
115
tools/forgejo-restore-drill.sh
Executable file
115
tools/forgejo-restore-drill.sh
Executable file
|
|
@ -0,0 +1,115 @@
|
|||
#!/usr/bin/env bash
|
||||
# Non-production Forgejo backup/restore drill (RAIL-HO-WP-0005-T09).
|
||||
# Re-run: DRILL_CLEAN=1 ./tools/forgejo-restore-drill.sh (wipes namespace first)
|
||||
set -euo pipefail
|
||||
|
||||
KUBECONFIG="${KUBECONFIG:-$HOME/.kube/config-hosteurope}"
|
||||
export KUBECONFIG
|
||||
NS=forgejo-restore-drill
|
||||
DRILL_CLEAN="${DRILL_CLEAN:-0}"
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
ROOT_DIR="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||
BACKUP_LOCAL="${BACKUP_LOCAL:-/tmp/forgejo-drill/forgejo-drill-backup.zip}"
|
||||
PROD_POD="${PROD_POD:-$(kubectl get pods -n forgejo -l app.kubernetes.io/instance=forgejo -o jsonpath='{.items[0].metadata.name}')}"
|
||||
|
||||
step() { echo "==> $*"; }
|
||||
|
||||
if [[ "${DRILL_CLEAN}" == "1" ]]; then
|
||||
step "Clean prior drill namespace ${NS}"
|
||||
kubectl delete namespace "${NS}" --wait=true --timeout=5m || true
|
||||
fi
|
||||
|
||||
step "Create namespace ${NS}"
|
||||
kubectl create namespace "${NS}" --dry-run=client -o yaml | kubectl apply -f -
|
||||
|
||||
step "Copy forgejo-db-credentials into ${NS}"
|
||||
kubectl get secret forgejo-db-credentials -n databases -o json \
|
||||
| python3 -c "import json,sys; s=json.load(sys.stdin); s['metadata']={k:v for k,v in s['metadata'].items() if k in ('name','labels','annotations')}; s['metadata']['namespace']='${NS}'; print(json.dumps(s))" \
|
||||
| kubectl apply -f -
|
||||
|
||||
step "Deploy restore CNPG cluster"
|
||||
kubectl apply -f "${ROOT_DIR}/infra/forgejo-restore-drill/forgejo-db-restore-cluster.yaml"
|
||||
kubectl wait --for=condition=Ready cluster/forgejo-db-restore -n "${NS}" --timeout=10m
|
||||
|
||||
step "Ensure local backup exists"
|
||||
if [[ ! -f "${BACKUP_LOCAL}" ]]; then
|
||||
kubectl exec -n forgejo "${PROD_POD}" -c gitea -- forgejo dump -f /tmp/forgejo-drill-backup.zip
|
||||
mkdir -p "$(dirname "${BACKUP_LOCAL}")"
|
||||
kubectl cp "forgejo/${PROD_POD}:/tmp/forgejo-drill-backup.zip" "${BACKUP_LOCAL}" -c gitea
|
||||
fi
|
||||
ls -lh "${BACKUP_LOCAL}"
|
||||
|
||||
step "Apply restore PVC"
|
||||
kubectl apply -f "${ROOT_DIR}/infra/forgejo-restore-drill/restore-job.yaml"
|
||||
|
||||
step "Run restore pod (stage backup, import files + SQL)"
|
||||
kubectl delete pod forgejo-restore-import -n "${NS}" --ignore-not-found --wait=true
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: forgejo-restore-import
|
||||
namespace: ${NS}
|
||||
spec:
|
||||
restartPolicy: Never
|
||||
containers:
|
||||
- name: restore
|
||||
image: code.forgejo.org/forgejo/forgejo:11.0.3
|
||||
command: ["sleep", "3600"]
|
||||
volumeMounts:
|
||||
- name: data
|
||||
mountPath: /data
|
||||
- name: backup
|
||||
mountPath: /backup
|
||||
volumes:
|
||||
- name: data
|
||||
persistentVolumeClaim:
|
||||
claimName: forgejo-restore-data
|
||||
- name: backup
|
||||
emptyDir: {}
|
||||
EOF
|
||||
kubectl wait --for=condition=Ready pod/forgejo-restore-import -n "${NS}" --timeout=3m
|
||||
kubectl cp "${BACKUP_LOCAL}" "${NS}/forgejo-restore-import:/backup/forgejo-drill-backup.zip" -c restore
|
||||
DB_PASS="$(kubectl get secret forgejo-db-credentials -n "${NS}" -o jsonpath='{.data.password}' | base64 -d)"
|
||||
kubectl exec -n "${NS}" forgejo-restore-import -c restore -- env POSTGRES_PASSWORD="${DB_PASS}" sh -c '
|
||||
set -eu
|
||||
apk add --no-cache unzip postgresql-client >/dev/null
|
||||
rm -rf /data/*
|
||||
mkdir -p /data/git/gitea-repositories
|
||||
unzip -q /backup/forgejo-drill-backup.zip -d /tmp/dump
|
||||
cp -a /tmp/dump/repos/. /data/git/gitea-repositories/
|
||||
cp -a /tmp/dump/data/. /data/
|
||||
chown -R git:git /data
|
||||
PGPASSWORD="${POSTGRES_PASSWORD}" psql -h forgejo-db-restore-rw.forgejo-restore-drill.svc.cluster.local -U forgejo -d forgejo -v ON_ERROR_STOP=1 -f /tmp/dump/forgejo-db.sql
|
||||
echo restore-import-ok
|
||||
'
|
||||
unset DB_PASS
|
||||
kubectl delete pod forgejo-restore-import -n "${NS}" --wait=true
|
||||
|
||||
step "Deploy isolated Forgejo release"
|
||||
cd "${HOME}/railiance-apps"
|
||||
DB_PASS="$(kubectl get secret forgejo-db-credentials -n "${NS}" -o jsonpath='{.data.password}' | base64 -d)"
|
||||
helm upgrade --install forgejo-restore gitea-charts/gitea --version 12.5.0 \
|
||||
--namespace "${NS}" --create-namespace \
|
||||
-f helm/forgejo-values.yaml \
|
||||
-f helm/forgejo-registry-values.yaml \
|
||||
--set strategy.type=Recreate \
|
||||
--set persistence.existingClaim=forgejo-restore-data \
|
||||
--set gitea.config.database.HOST=forgejo-db-restore-rw.${NS}.svc.cluster.local:5432 \
|
||||
--set gitea.config.database.PASSWD="${DB_PASS}" \
|
||||
--set gitea.config.server.DOMAIN=forgejo-restore.local \
|
||||
--set gitea.config.server.ROOT_URL=http://forgejo-restore.local:3000/ \
|
||||
--set gitea.admin.password=restore-drill-local-only \
|
||||
--set ingress.enabled=false \
|
||||
--wait --timeout=10m
|
||||
unset DB_PASS
|
||||
|
||||
step "Post-restore checks via port-forward"
|
||||
kubectl port-forward -n "${NS}" svc/forgejo-restore-gitea-http 13000:3000 >/tmp/forgejo-restore-pf.log 2>&1 &
|
||||
PF_PID=$!
|
||||
sleep 5
|
||||
curl -fsS -o /dev/null -w 'health:%{http_code}\n' http://127.0.0.1:13000/
|
||||
curl -fsS http://127.0.0.1:13000/api/v1/repos/coulomb/glas-harness | python3 -c "import json,sys; d=json.load(sys.stdin); print('repo', d.get('full_name'), d.get('default_branch'))"
|
||||
curl -fsS http://127.0.0.1:13000/api/v1/repos/coulomb/key-cape | python3 -c "import json,sys; d=json.load(sys.stdin); print('repo', d.get('full_name'), d.get('default_branch'))"
|
||||
kill "${PF_PID}" 2>/dev/null || true
|
||||
echo "restore-drill-complete"
|
||||
|
|
@ -109,7 +109,8 @@ acceptance criteria below are tracked across T05, T07, T08, and T10 instead.
|
|||
Still to prove before T11:
|
||||
|
||||
- SMTP/password reset end-to-end (T06).
|
||||
- Backup and restore in isolated namespace (T09).
|
||||
- Backup and restore in isolated namespace (T09) — **drill passed 2026-07-04**;
|
||||
scheduled automation pending.
|
||||
- Issues/releases/wiki/LFS per inventory classification (T10 matrix).
|
||||
- Operator SSH identity on Forgejo beyond interim `forgejo_admin` keys (T02/T10).
|
||||
|
||||
|
|
@ -377,7 +378,7 @@ a pullable image without privileged cluster-wide credentials. **Tier 2: done.**
|
|||
|
||||
```task
|
||||
id: RAIL-HO-WP-0005-T09
|
||||
status: todo
|
||||
status: progress
|
||||
priority: high
|
||||
state_hub_task_id: "25892007-36ca-4bd9-8adf-84d505465d7d"
|
||||
```
|
||||
|
|
@ -402,8 +403,17 @@ Acceptance:
|
|||
- Restore into an isolated namespace is drilled and documented.
|
||||
- RPO/RTO expectations are recorded.
|
||||
|
||||
**Partial (2026-07-04):** isolated restore drill **passed**. Production
|
||||
`forgejo dump` (~11.7 MiB) restored into `forgejo-restore-drill` namespace;
|
||||
post-restore API checks: health 200, `coulomb/glas-harness` and
|
||||
`coulomb/key-cape` on `main`, 3 org repos visible. Evidence:
|
||||
`docs/forgejo-restore-drill-evidence.md`. Assets: `infra/forgejo-restore-drill/`,
|
||||
`tools/forgejo-restore-drill.sh`. Remaining: scheduled CNPG/off-cluster backups,
|
||||
encryption/approved target (T02/T04), automated dump schedule.
|
||||
|
||||
**Done when:** a fresh backup restores to a working isolated Forgejo instance
|
||||
with repository, package, and user recovery checks passing.
|
||||
with repository, package, and user recovery checks passing **and** scheduled
|
||||
backups run without manual intervention.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -520,8 +530,9 @@ T05+T08 ──► T10 migration ladder ──► T11 production cutover ──
|
|||
T03 isolated probe: CANCELLED (superseded by T05 + in-production pilots)
|
||||
```
|
||||
|
||||
**Current focus (2026-07-04):** T10 tiers 0–2 **complete**; T09 backup drill
|
||||
and T02 open decisions (SMTP, backup target) before tier-3 production repos.
|
||||
**Current focus (2026-07-04):** T10 tiers 0–2 **complete**; T09 restore drill
|
||||
**passed** (scheduled backups + backup target still open); T02 decisions (SMTP,
|
||||
backup target) before tier-3 production repos.
|
||||
Do not start T11 `state-hub` until T09 complete and `CUST-WP-0054` Wave-1
|
||||
gates satisfied.
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue