railiance-cluster/workplans/RAIL-BS-WP-0004-safety-net.md
tegwick a15ceee92b chore(workplan): add state_hub_task_ids to WP-0004
Written by fix-consistency: T01-T06 registered in state hub.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 15:24:28 +01:00

172 lines
4.5 KiB
Markdown

---
id: RAIL-BS-WP-0004
type: workplan
title: "Current-Environment Safety Net"
domain: railiance
repo: railiance-cluster
status: active
owner: tegwick
topic_slug: railiance
state_hub_workstream_id: "7e8b0c20-51eb-40c9-9e3b-85dd380d7625"
created: "2026-02-25"
updated: "2026-03-10"
---
# Current-Environment Safety Net
## Goal
Ensure backup and disaster recovery for the current single-server environment
is operational and tested before any ThreePhoenix infrastructure migration
work begins. Aligned to OAS Stack S2 (railiance-cluster owns backup tooling).
## Context
The backup toolchain lives in `tools/cmd/railiance-backup` and
`tools/cmd/railiance-preflight`, dispatched via `bin/railiance`. It protects:
| Asset | Method | Risk without backup |
|---|---|---|
| Custodian State Hub DB | pg_dump → age → Nextcloud | Total loss of workstreams, decisions, history |
| Claude config + memory | tar → age → Nextcloud | Loss of MCP registration, project memory |
| Git repos | Gitea remotes | SPOF: Gitea runs on the same server being migrated |
Decision D2: Nextcloud upload-only file drop as backup destination.
## OAS Alignment
Per ADR-003, backup tooling lives in **S2 (railiance-cluster)**. The preflight
check covers all five OAS stack repos:
| Repo | OAS Layer |
|---|---|
| railiance-infra | S1 — OS & Provisioning |
| railiance-cluster | S2 — Kubernetes Runtime |
| railiance-platform | S3 — Platform Services |
| railiance-enablement | S4 — Developer Tooling |
| railiance-apps | S5 — Workloads & Endpoints |
Plus cross-domain repos: the-custodian, markitect_project, activity-core,
net-kingdom, issue-facade, binect-js, kaizen-agentic.
## Boundary
Backup execution: this repo (`bin/railiance backup`).
Backup destination: Nextcloud file drop (URL in `~/.config/railiance/nc-upload-url` or hardcoded).
Restore procedure: `docs/backup-restore.md`.
---
## Tasks
### T01 — Update preflight repo list to OAS 5-repo layout
```task
id: T01
status: done
priority: high
state_hub_task_id: "4526a842-ea31-4874-9231-92ab556cfe7b"
```
Update `tools/cmd/railiance-preflight` REPOS array: remove `railiance-bootstrap`,
add `railiance-infra`, `railiance-cluster`, `railiance-platform`,
`railiance-enablement`, `railiance-apps`. Add all active project repos.
**Done when:** `bin/railiance preflight` checks all current repos.
---
### T02 — Fix stale repo references in backup-restore.md
```task
id: T02
status: done
priority: medium
state_hub_task_id: "a6313e06-1976-46a7-8e31-df4eb2eca880"
```
Update restore procedure: `railiance-bootstrap``railiance-cluster`,
`railiance-hosts``railiance-infra`, add the three new OAS repos.
**Done when:** doc accurately reflects the current 5-repo OAS stack.
---
### T03 — Add make backup and make preflight targets
```task
id: T03
status: done
priority: medium
state_hub_task_id: "05d42a55-921f-4aa7-bb76-e8af9c7e0ac3"
```
Add to root Makefile so the safety net is discoverable from `make help`.
**Done when:** `make backup` and `make preflight` both work.
---
### T04 — Run current backup and verify upload
```task
id: T04
status: done
priority: high
state_hub_task_id: "08233868-d522-4117-bc4e-6c0f52545665"
```
Run `bin/railiance backup` and confirm both DB and config files appear
in the Nextcloud file drop.
**Done when:** backup completes without error and `.last-backup` stamp is fresh.
---
### T05 — Verify or install cron job
```task
id: T05
status: todo
priority: medium
state_hub_task_id: "2d5acff7-4a4e-4ddd-ad06-08237ad3dac8"
```
Confirm that the daily 02:00 cron job is installed and has run at least once:
```bash
crontab -l | grep railiance
cat ~/.cache/railiance/backup.log | tail -20
```
If missing, install:
```bash
(crontab -l 2>/dev/null; echo "0 2 * * * /home/worsch/railiance-cluster/bin/railiance backup >> ~/.cache/railiance/backup.log 2>&1") | crontab -
```
**Done when:** cron is listed and log shows a successful run.
---
### T06 — Run restore drill
```task
id: T06
status: todo
priority: medium
state_hub_task_id: "f8e4a094-c367-40eb-b895-da17bc144b07"
```
Run the minimal restore drill from `docs/backup-restore.md` against the
current backup. Record completion in `~/.cache/railiance/restore-drill.log`.
**Done when:** drill exits 0 and log entry is written.
---
## References
- Decision D2: Nextcloud as backup destination (`DECISIONS.md`)
- Backup tooling: `tools/cmd/railiance-backup`, `tools/cmd/railiance-preflight`
- Restore procedure: `docs/backup-restore.md`
- Extension points: EP-RAIL-003 (git bare mirrors), EP-RAIL-004 (secondary offsite copy)