From c3a95e93b48f3ac3e64e7b66c16586d10471e2c2 Mon Sep 17 00:00:00 2001 From: tegwick Date: Thu, 2 Jul 2026 00:09:48 +0200 Subject: [PATCH 01/10] chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-07-02: - update .custodian-brief.md for railiance-cluster --- .custodian-brief.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/.custodian-brief.md b/.custodian-brief.md index 9639a9f..1de17f3 100644 --- a/.custodian-brief.md +++ b/.custodian-brief.md @@ -2,7 +2,7 @@ # Custodian Brief — railiance-cluster **Domain:** financials -**Last synced:** 2026-07-01 22:04 UTC +**Last synced:** 2026-07-01 22:09 UTC **State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)* ## Current Goal @@ -11,12 +11,6 @@ Install k3s and Kubernetes Baseline on the HostEurope Server ## Active Workstreams -### activity-core no-restart admin-sync smoke (ACTIVITY-WP-0012-T05) -Progress: 0/1 done | workstream_id: `2c9e8e96-ec6a-433c-9e6d-0efbcd18679e` - -**Open tasks:** -- ! Run the no-restart admin-sync smoke `60f3387d` - ### activity-core WP-0016 triage-output robustness deploy Progress: 0/4 done | workstream_id: `7cbbe0d6-fea9-41c6-840c-46d0d8e8edde` @@ -44,6 +38,12 @@ Progress: 0/1 done | workstream_id: `366eec46-3139-4810-ace6-ea75750fe821` **Open tasks:** - · Run no-restart admin-sync smoke with Temporal schedule verification (RAIL-BS-WP-0009-T01) `ffe665ce` +### activity-core no-restart admin-sync smoke (ACTIVITY-WP-0012-T05) +Progress: 0/1 done | workstream_id: `2c9e8e96-ec6a-433c-9e6d-0efbcd18679e` + +**Open tasks:** +- ! Run the no-restart admin-sync smoke `60f3387d` + ### ThreePhoenix - HA Cluster Implementation Progress: 0/7 done | workstream_id: `9e208376-23f1-40c7-9813-fac1f7d6ad3b` From adb758b6d63012a62f5d4fda2136393e67960f0d Mon Sep 17 00:00:00 2001 From: tegwick Date: Thu, 2 Jul 2026 00:25:42 +0200 Subject: [PATCH 02/10] chore(consistency): commit workplan task-id writeback Co-Authored-By: Claude Fable 5 --- ...L-BS-WP-0008-activity-core-wp0016-triage-output-deploy.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/workplans/RAIL-BS-WP-0008-activity-core-wp0016-triage-output-deploy.md b/workplans/RAIL-BS-WP-0008-activity-core-wp0016-triage-output-deploy.md index 150455b..299a58d 100644 --- a/workplans/RAIL-BS-WP-0008-activity-core-wp0016-triage-output-deploy.md +++ b/workplans/RAIL-BS-WP-0008-activity-core-wp0016-triage-output-deploy.md @@ -9,6 +9,7 @@ owner: railiance-cluster topic_slug: railiance created: "2026-07-01" updated: "2026-07-01" +state_hub_workstream_id: "7cbbe0d6-fea9-41c6-840c-46d0d8e8edde" --- # activity-core WP-0016 triage-output robustness deploy @@ -33,6 +34,7 @@ whole-doc validator. It MUST ship together with the new `executor.py` id: RAIL-BS-WP-0008-T01 status: todo priority: high +state_hub_task_id: "079e39a9-f938-4d03-a5bc-4d3d2f7b1d83" ``` Rebuild/import the activity-core image from main (`bf877b7` or later) into @@ -45,6 +47,7 @@ new executor and the strict per-item schema ship together. id: RAIL-BS-WP-0008-T02 status: todo priority: high +state_hub_task_id: "129fb472-41e8-4e5c-bcbb-0995a96e223b" ``` In the runtime projection (not the activity-core repo), update the @@ -64,6 +67,7 @@ In the runtime projection (not the activity-core repo), update the id: RAIL-BS-WP-0008-T03 status: todo priority: medium +state_hub_task_id: "59559f1d-821f-4660-8a7d-c623c6631864" ``` From the llm-connect pod logs / response store on railiance01, capture the @@ -78,6 +82,7 @@ secrets. id: RAIL-BS-WP-0008-T04 status: todo priority: high +state_hub_task_id: "8096621a-54ee-4be5-943e-5dc2da19ed28" ``` Trigger one daily-triage run against the reconciled runtime and confirm it From 5ac713641d4ff631ef0a9202f499bb230d66210f Mon Sep 17 00:00:00 2001 From: tegwick Date: Thu, 2 Jul 2026 00:27:26 +0200 Subject: [PATCH 03/10] chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-07-02: - update .custodian-brief.md for railiance-cluster --- .custodian-brief.md | 20 +------------------- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/.custodian-brief.md b/.custodian-brief.md index 1de17f3..01e3722 100644 --- a/.custodian-brief.md +++ b/.custodian-brief.md @@ -2,7 +2,7 @@ # Custodian Brief — railiance-cluster **Domain:** financials -**Last synced:** 2026-07-01 22:09 UTC +**Last synced:** 2026-07-01 22:27 UTC **State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)* ## Current Goal @@ -20,24 +20,6 @@ Progress: 0/4 done | workstream_id: `7cbbe0d6-fea9-41c6-840c-46d0d8e8edde` - · Pull raw llm-connect response for the 2026-06-26 run `59559f1d` - · Acceptance smoke `8096621a` -### activity-core WP-0016 triage-output robustness deploy -Progress: no tasks done | workstream_id: `5032c55c-2ee2-4b7e-b1eb-157f0f8ac647` - -### activity-core WP-0016 triage-output robustness deploy -Progress: 0/4 done | workstream_id: `f2ca1a5d-4dd6-42ea-8003-969c7265f891` - -**Open tasks:** -- · Update daily-statehub-wsjf-triage runtime-bundle Instruction (RAIL-BS-WP-0008-T02) `2338d061` -- · Deploy activity-core with coupled schema and executor (RAIL-BS-WP-0008-T01) `1ea0945a` -- · Pull raw llm-connect response for 2026-06-26 run (RAIL-BS-WP-0008-T03) `b799917b` -- · Acceptance smoke: daily-triage clean or graceful degrade (RAIL-BS-WP-0008-T04) `e267a366` - -### activity-core no-restart admin-sync smoke (ACTIVITY-WP-0012-T05) -Progress: 0/1 done | workstream_id: `366eec46-3139-4810-ace6-ea75750fe821` - -**Open tasks:** -- · Run no-restart admin-sync smoke with Temporal schedule verification (RAIL-BS-WP-0009-T01) `ffe665ce` - ### activity-core no-restart admin-sync smoke (ACTIVITY-WP-0012-T05) Progress: 0/1 done | workstream_id: `2c9e8e96-ec6a-433c-9e6d-0efbcd18679e` From 84c005254d499b9291c41cc06abaf7544cf3f2d1 Mon Sep 17 00:00:00 2001 From: tegwick Date: Thu, 2 Jul 2026 01:47:45 +0200 Subject: [PATCH 04/10] Regenerate agent instructions: workstream -> workplan terminology Registration guidance now prescribes file-first + fix-consistency (C-06) instead of manual create_workplan/create_workstream calls; progress-event examples use workplan_id; legacy field names annotated. Co-Authored-By: Claude Fable 5 --- .claude/rules/credential-routing.md | 2 +- .claude/rules/first-session.md | 26 +++++++++++++---------- .claude/rules/session-protocol.md | 31 ++++++++++++++++++---------- .claude/rules/workplan-convention.md | 13 ++++++++---- AGENTS.md | 26 ++++++++++++++--------- 5 files changed, 61 insertions(+), 37 deletions(-) diff --git a/.claude/rules/credential-routing.md b/.claude/rules/credential-routing.md index e651c0d..64b8403 100644 --- a/.claude/rules/credential-routing.md +++ b/.claude/rules/credential-routing.md @@ -20,7 +20,7 @@ Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run wa | Agent runtime | How to orient | | --- | --- | | **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=railiance-cluster` is for coordination, not secret vending | -| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership | +| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workplans; **still** use `warden route` for credential ownership | | **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` | ### Quick routing table diff --git a/.claude/rules/first-session.md b/.claude/rules/first-session.md index d5322f8..a7ac6ec 100644 --- a/.claude/rules/first-session.md +++ b/.claude/rules/first-session.md @@ -1,6 +1,6 @@ ## First Session Protocol -Triggered when `get_domain_summary("financials")` shows **no workstreams**. +Triggered when `get_domain_summary("financials")` shows **no workplans**. The project is registered but work has not yet been structured. **Step 1 — Read, don't write** @@ -11,27 +11,31 @@ The project is registered but work has not yet been structured. **Step 2 — Survey in-progress work** Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete. -**Step 3 — Propose workstreams to Bernd** -Propose 1–3 workstreams — each a coherent strand, weeks to months, anchored to a +**Step 3 — Propose workplans to Bernd** +Propose 1–3 workplans — each a coherent strand, weeks to months, anchored to a roadmap phase. **Wait for approval before creating.** -**Step 4 — Create workplan file first, then DB record (ADR-001)** +**Step 4 — Write the workplan file; fix-consistency registers it (ADR-001)** ``` -workplans/RAIL-BS-WP-NNNN-.md ← write this first +workplans/RAIL-BS-WP-NNNN-.md ← write this, commit it ``` -Then register in the hub: -``` -create_workstream(topic_id="ca369340-a64e-442e-98f1-a4fa7dc74a38", title="...", owner="...", description="...") -create_task(workstream_id="", title="...", priority="high|medium|low") +Then register by running the consistency check — do **not** call +`create_workplan`/`create_task` (or legacy `create_workstream`) yourself; +manual registration duplicates what C-06 creates from the file: +```bash +statehub fix-consistency --repo railiance-cluster ``` +C-06 creates the hub workplan + tasks and writes `state_hub_workstream_id` / +`state_hub_task_id` back into the file (legacy field names, kept for +compatibility — they hold workplan/task IDs). **Step 5 — Record the setup** ``` add_progress_event( - summary="First session: structured financials into N workstreams, M tasks", + summary="First session: structured financials into N workplans, M tasks", event_type="milestone", topic_id="ca369340-a64e-442e-98f1-a4fa7dc74a38", - detail={"workstreams": [...], "tasks_created": M} + detail={"workplans": [...], "tasks_created": M} ) ``` diff --git a/.claude/rules/session-protocol.md b/.claude/rules/session-protocol.md index a3ad8aa..729db36 100644 --- a/.claude/rules/session-protocol.md +++ b/.claude/rules/session-protocol.md @@ -44,7 +44,7 @@ For each file with `status: ready`, `active`, or `blocked`, note pending **Step 4 — Present brief** -1. **Active workstreams** for `financials` — title, task counts, blocking decisions +1. **Active workplans** for `financials` — title, task counts, blocking decisions 2. **Pending tasks** from `workplans/` + any `[repo:railiance-cluster]` hub tasks 3. **Goal guidance** — if `goal_guidance` in summary: - `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"* @@ -52,33 +52,42 @@ For each file with `status: ready`, `active`, or `blocked`, note pending 4. **Suggested next action** — highest-priority open item 5. **SBOM status** — flag if `last_sbom_at` is unset for this repo -If no workstreams: follow First Session Protocol (`first-session.md`). +If no workplans: follow First Session Protocol (`first-session.md`). **During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()` -> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`) -> are First Session Protocol only. Work structure belongs in repo files (ADR-001). +> State Hub is a *read model*. **Never register workplans or tasks by hand** +> (`create_workplan`, `create_task`, or the legacy `create_workstream`) — write +> the workplan file in `workplans/` and run `fix-consistency`; its C-06 check +> registers the workplan and its tasks in the hub and writes the IDs back into +> the file. Manual registration creates duplicates the moment fix-consistency +> runs. Work structure belongs in repo files (ADR-001). +> +> Terminology: "workstream" is the legacy name for workplan. Some API/frontmatter +> field names keep it for compatibility (`state_hub_workstream_id`, +> `workstream_id` params) — treat them as workplan IDs. **Session close:** With MCP tools: ``` -add_progress_event(summary="...", topic_id="ca369340-a64e-442e-98f1-a4fa7dc74a38", workstream_id="") +add_progress_event(summary="...", topic_id="ca369340-a64e-442e-98f1-a4fa7dc74a38", workplan_id="") ``` Without MCP tools: ```bash curl -s -X POST http://127.0.0.1:8000/progress/ \ -H "Content-Type: application/json" \ - -d '{"topic_id":"ca369340-a64e-442e-98f1-a4fa7dc74a38","workstream_id":"","event_type":"note","summary":"what changed","author":"codex"}' + -d '{"topic_id":"ca369340-a64e-442e-98f1-a4fa7dc74a38","workplan_id":"","event_type":"note","summary":"what changed","author":"codex"}' ``` -If workplan files were modified, ensure the local copy is up to date first: +If workplan files were modified, ensure the local copy is up to date first, +then sync from the repo checkout: ```bash -git -C pull --ff-only -cd ~/state-hub && make fix-consistency REPO=railiance-cluster +git pull --ff-only +statehub fix-consistency ``` For repos where implementation runs on a remote machine (e.g. CoulombCore), -use the combined target which pulls before fixing: +use the pull-before-fix mode from any shell with the State Hub CLI: ```bash -cd ~/state-hub && make fix-consistency-remote REPO=railiance-cluster +statehub fix-consistency --repo railiance-cluster --remote ``` **C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback will sync the file to match DB. **C-16** (repo behind remote) blocks all writes diff --git a/.claude/rules/workplan-convention.md b/.claude/rules/workplan-convention.md index cdacafe..f01c779 100644 --- a/.claude/rules/workplan-convention.md +++ b/.claude/rules/workplan-convention.md @@ -5,7 +5,7 @@ ID prefix: `RAIL-BS-WP-` Work items originate as files in this repo **before** being registered in the hub. -Canonical workplan/workstream frontmatter statuses are: +Canonical workplan frontmatter statuses are: `proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`. Use `proposed` for a newly drafted plan, `ready` after review against current repo state, and `finished` when implementation is complete. `stalled` and @@ -16,14 +16,15 @@ prefix: `YYMMDD-RAIL-BS-WP-NNNN-.md`. The frontmatter id remains unchanged; the prefix is only for quick visual reference. Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**: -`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids +`workplans/ADHOC-YYYY-MM-DD.md`, workplan slug `adhoc-YYYY-MM-DD`, and task ids `ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed directly. Promote anything requiring analysis, design, approval, dependencies, or multiple planned phases into a normal workplan. Ecosystem todos from other agents arrive as `[repo:railiance-cluster]` hub tasks — -visible at session start. Pick one up by creating the workplan file, then registering -the workstream. +visible at session start. Pick one up by creating the workplan file, committing, +and running `statehub fix-consistency` — C-06 registers the workplan in the hub. +Never register by hand with `create_workplan`/`create_workstream`. Task blocks use this shape: @@ -37,4 +38,8 @@ state_hub_task_id: "" # written by fix-consistency — do not edit Status progression is `todo` → `progress` → `done`; use `wait` for waiting or blocked work and `cancel` for stopped work. +Workplan frontmatter carries `state_hub_workstream_id` — a legacy field name +kept for compatibility ("workstream" is the old term for workplan); it holds +the hub workplan id and is written by fix-consistency. Do not edit or rename it. + diff --git a/AGENTS.md b/AGENTS.md index 3c579ac..07b9576 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -20,6 +20,12 @@ there is no MCP server for Codex agents. |---------|-----| | Local workstation | `http://127.0.0.1:8000` | | Remote via tunnel | `http://127.0.0.1:18000` | +| Optional local edge relay | http://127.0.0.1:18080 | + +When an operator has enabled the edge relay, set API_BASE to the relay URL. +Queueable writes return an explicit queued receipt if the central hub is +unreachable. Treat that as pending local evidence, then ask the operator to run +statehub outbox status/replay after connectivity returns. ### Orient at session start @@ -27,8 +33,8 @@ there is no MCP server for Codex agents. # Offline brief — works without hub connection cat .custodian-brief.md -# Active workstreams for this domain -curl -s "http://127.0.0.1:8000/workstreams/?topic_id=ca369340-a64e-442e-98f1-a4fa7dc74a38&status=active" \ +# Active workplans for this domain +curl -s "http://127.0.0.1:8000/workplans/?topic_id=ca369340-a64e-442e-98f1-a4fa7dc74a38&status=active" \ | python3 -m json.tool # Check inbox @@ -51,12 +57,12 @@ curl -s -X POST http://127.0.0.1:8000/progress/ \ "summary": "what was done", "event_type": "note", "author": "codex", - "workstream_id": "", + "workplan_id": "", "task_id": "" }' ``` -Omit `workstream_id` / `task_id` when not applicable. +Omit `workplan_id` / `task_id` when not applicable. ### Update task status @@ -80,7 +86,7 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/" \ ## Session Protocol **Start:** -1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe) +1. `cat .custodian-brief.md` — domain goal and open workplans (offline-safe) 2. Check inbox: `GET /messages/?to_agent=railiance-cluster&unread_only=true`; mark read 3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks 4. Check human-needed tasks: `GET /tasks/?needs_human=true` @@ -92,12 +98,12 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/" \ **Close:** 1. Update workplan file task statuses to reflect progress 2. Log: `POST /progress/` with a summary of what changed -3. Note for the custodian operator: after workplan file changes, run from - `~/state-hub`: +3. After workplan file changes, run: ```bash - make fix-consistency REPO=railiance-cluster + statehub fix-consistency ``` - This syncs task status from files into the hub DB. + Coding agents should run this directly; ask the operator only if the CLI or + State Hub API is unavailable. This syncs task status from files into the hub DB. --- @@ -123,7 +129,7 @@ Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run wa | Agent runtime | How to orient | | --- | --- | | **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=railiance-cluster` is for coordination, not secret vending | -| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership | +| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workplans; **still** use `warden route` for credential ownership | | **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` | ### Quick routing table From 9c55dfb02af3c7fadf082b56d04f84dd2b9130d0 Mon Sep 17 00:00:00 2001 From: tegwick Date: Thu, 2 Jul 2026 10:44:06 +0200 Subject: [PATCH 05/10] RAIL-BS-WP-0008/0009: operator deploy + admin-sync smoke tooling Co-Authored-By: Claude Fable 5 --- Makefile | 8 +- bin/railiance | 6 + docs/operator-runbook.md | 7 + tools/cmd/railiance-admin-sync-smoke | 155 +++++++++++ ...nce-deploy-activity-core-triage-robustness | 263 ++++++++++++++++++ ...tivity-core-wp0016-triage-output-deploy.md | 21 +- ...-WP-0009-activity-core-admin-sync-smoke.md | 11 +- 7 files changed, 467 insertions(+), 4 deletions(-) create mode 100755 tools/cmd/railiance-admin-sync-smoke create mode 100755 tools/cmd/railiance-deploy-activity-core-triage-robustness diff --git a/Makefile b/Makefile index f489220..523feda 100644 --- a/Makefile +++ b/Makefile @@ -30,6 +30,12 @@ verify-activity-core: ## Reconcile activity-core runtime and verify disabled ops reconcile-activity-core-llm-connect: ## Reconcile activity-core llm-connect URL and run non-secret gate checks tools/cmd/railiance-reconcile-activity-core-llm-connect +deploy-activity-core-triage-robustness: ## Deploy ACTIVITY-WP-0016 bundle and prove daily-triage output validation + tools/cmd/railiance-deploy-activity-core-triage-robustness + +admin-sync-smoke: ## Run activity-core no-restart POST /admin/sync smoke + tools/cmd/railiance-admin-sync-smoke + ##@ Help help: ## Show this help @@ -37,4 +43,4 @@ help: ## Show this help /^[a-zA-Z_-]+:.*?##/ { printf " \033[36m%-20s\033[0m %s\n", $$1, $$2 } \ /^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) }' $(MAKEFILE_LIST) -.PHONY: backup restore preflight k3s-install smoke test-ha-failover verify-activity-core reconcile-activity-core-llm-connect help +.PHONY: backup restore preflight k3s-install smoke test-ha-failover verify-activity-core reconcile-activity-core-llm-connect deploy-activity-core-triage-robustness admin-sync-smoke help diff --git a/bin/railiance b/bin/railiance index 52d17ff..1fabd75 100755 --- a/bin/railiance +++ b/bin/railiance @@ -22,6 +22,10 @@ Commands: observe Plan/run Stage 2 observation checks promote Plan/apply Stage 3 stable promotion rollback Plan/apply rollback to previous stable + deploy-triage-robustness + Deploy ACTIVITY-WP-0016 and prove daily-triage validation + admin-sync-smoke + Run activity-core no-restart POST /admin/sync smoke build-spore Build a distributable "Spore" bundle seed-local Run the seed script on this machine checklist Pre-VM checklist @@ -51,6 +55,8 @@ case "$cmd" in observe) exec railiance-stage2 observe "$@" ;; promote) exec railiance-stage3 promote "$@" ;; rollback) exec railiance-stage3 rollback "$@" ;; + deploy-triage-robustness) exec railiance-deploy-activity-core-triage-robustness "$@" ;; + admin-sync-smoke) exec railiance-admin-sync-smoke "$@" ;; build-spore) bash "$ROOT/tools/build_spore.sh" ;; seed-local) bash "$ROOT/tools/seed_node.sh" ;; checklist) diff --git a/docs/operator-runbook.md b/docs/operator-runbook.md index 206abb2..88ef13d 100644 --- a/docs/operator-runbook.md +++ b/docs/operator-runbook.md @@ -21,6 +21,8 @@ mode are denied these by the permission classifier — that is intentional. | `make test-ha-failover` | kills the primary PG pod to assert recovery | | `make verify-activity-core` | reconciles activity-core runtime on railiance01 | | `make reconcile-activity-core-llm-connect` | patches ConfigMap, applies llm-connect overlay, runs smoke pod | +| `make deploy-activity-core-triage-robustness` | deploys ACTIVITY-WP-0016 code/schema/runtime as a coupled bundle and triggers daily triage | +| `make admin-sync-smoke` | calls activity-core `POST /admin/sync` and proves worker pod identity/restart count did not change | ## Read-only / safe targets @@ -33,3 +35,8 @@ Reconcile/verify targets post non-secret evidence notes to the State Hub (`STATE_HUB_EVIDENCE_WORKSTREAM_ID` / `STATE_HUB_EVIDENCE_TASK_ID` env vars attach them to a workstream/task). Never record Secret values — key counts and readiness states only. + +For `make admin-sync-smoke`, set `ACTIVITY_CORE_ADMIN_SYNC_FIXTURE_COMMAND` +when you need a specific enabled-flip/rename fixture before the sync call. The +command records whether a fixture ran; leaving it unset proves endpoint and +no-restart behavior only. \ No newline at end of file diff --git a/tools/cmd/railiance-admin-sync-smoke b/tools/cmd/railiance-admin-sync-smoke new file mode 100755 index 0000000..6ebb8be --- /dev/null +++ b/tools/cmd/railiance-admin-sync-smoke @@ -0,0 +1,155 @@ +#!/usr/bin/env bash +# Prove POST /admin/sync works without restarting the activity-core worker. +set -euo pipefail + +NAMESPACE="${ACTIVITY_CORE_NAMESPACE:-activity-core}" +CLUSTER_HOST="${ACTIVITY_CORE_CLUSTER_HOST:-railiance01}" +STATE_HUB_URL="${STATE_HUB_URL:-http://127.0.0.1:8000}" +ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL="${ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL:-0}" +ACTIVITY_CORE_ADMIN_SYNC_FIXTURE_COMMAND="${ACTIVITY_CORE_ADMIN_SYNC_FIXTURE_COMMAND:-}" +ACTIVITY_CORE_ADMIN_SYNC_REQUIRE_FIXTURE="${ACTIVITY_CORE_ADMIN_SYNC_REQUIRE_FIXTURE:-0}" +EVIDENCE_WORKSTREAM_ID="${STATE_HUB_EVIDENCE_WORKSTREAM_ID:-2c9e8e96-ec6a-433c-9e6d-0efbcd18679e}" +EVIDENCE_TASK_ID="${STATE_HUB_EVIDENCE_TASK_ID:-60f3387d-3d14-42a9-b8a3-725a86468510}" + +STARTED_AT="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" +CURRENT_GATE=startup +BEFORE_JSON="" +AFTER_JSON="" +FIXTURE_STATUS=skipped +SYNC_RESPONSE_JSON="" +EVIDENCE_NOTE_JSON="" + +export NAMESPACE CLUSTER_HOST STATE_HUB_URL EVIDENCE_WORKSTREAM_ID EVIDENCE_TASK_ID +export STARTED_AT BEFORE_JSON AFTER_JSON FIXTURE_STATUS SYNC_RESPONSE_JSON + +log() { printf '[activity-core-admin-sync-smoke] %s\n' "$*"; } +quote() { printf '%q' "$1"; } +cluster_bash() { if [[ -n "$CLUSTER_HOST" ]]; then ssh "$CLUSTER_HOST" "bash -s" <<<"$1"; else bash -s <<<"$1"; fi; } + +post_evidence() { + local status="$1" failing_gate="${2:-}" + export EVIDENCE_STATUS="$status" FAILING_GATE="$failing_gate" + python3 - <<'PY' +import json, os, sys, urllib.request + +def env_json(name): + raw = os.environ.get(name, "") + if not raw: + return None + try: + return json.loads(raw) + except json.JSONDecodeError: + return {"raw": raw} + +status = os.environ["EVIDENCE_STATUS"] +failing_gate = os.environ.get("FAILING_GATE") or None +detail = { + "producer": "railiance-cluster", + "verification": "activity-core no-restart admin sync smoke", + "status": status, + "failing_gate": failing_gate, + "cluster_host": os.environ.get("CLUSTER_HOST") or "local-kubectl", + "namespace": os.environ.get("NAMESPACE"), + "worker_before": env_json("BEFORE_JSON"), + "worker_after": env_json("AFTER_JSON"), + "fixture_status": os.environ.get("FIXTURE_STATUS"), + "sync_response": env_json("SYNC_RESPONSE_JSON"), + "started_at": os.environ.get("STARTED_AT"), +} +summary = ( + "Railiance activity-core no-restart admin-sync smoke passed: POST /admin/sync returned expected counters and worker pod identity/restart count stayed stable." + if status == "passed" + else "Railiance activity-core no-restart admin-sync smoke failed" + (f" at {failing_gate}" if failing_gate else "") + "; see non-secret evidence detail." +) +payload = {"summary": summary, "event_type": "note", "author": "railiance-cluster", "detail": detail} +if os.environ.get("EVIDENCE_WORKSTREAM_ID"): + payload["workstream_id"] = os.environ["EVIDENCE_WORKSTREAM_ID"] +if os.environ.get("EVIDENCE_TASK_ID"): + payload["task_id"] = os.environ["EVIDENCE_TASK_ID"] +req = urllib.request.Request(os.environ["STATE_HUB_URL"].rstrip("/") + "/progress/", data=json.dumps(payload).encode(), headers={"Content-Type": "application/json"}, method="POST") +with urllib.request.urlopen(req, timeout=20) as resp: + sys.stdout.write(resp.read().decode()) +PY +} + +on_error() { local code=$?; trap - ERR; post_evidence failed "$CURRENT_GATE" >/dev/null || true; exit "$code"; } +trap on_error ERR + +if [[ "$CLUSTER_HOST" == local ]]; then + [[ "$ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL" == 1 ]] || { echo 'ACTIVITY_CORE_CLUSTER_HOST=local requires ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL=1' >&2; exit 2; } + CLUSTER_HOST="" +fi +export CLUSTER_HOST + +CURRENT_GATE='cluster executor preflight' +log "using cluster executor: ${CLUSTER_HOST:-local kubectl}" +cluster_bash 'set -euo pipefail; command -v kubectl >/dev/null; command -v python3 >/dev/null' + +worker_snapshot_script='import json,sys +items=json.load(sys.stdin).get("items",[]) +if not items: raise SystemExit("no actcore-worker pods found") +pod=sorted(items,key=lambda item:item["metadata"]["name"])[0] +container=pod["status"]["containerStatuses"][0] +print(json.dumps({"name":pod["metadata"]["name"],"uid":pod["metadata"]["uid"],"phase":pod["status"].get("phase"),"restart_count":container.get("restartCount",0),"image":container.get("image"),"image_id":container.get("imageID")}, sort_keys=True))' + +CURRENT_GATE='worker baseline capture' +BEFORE_JSON="$(cluster_bash "kubectl -n $(quote "$NAMESPACE") get pod -l app.kubernetes.io/name=actcore-worker -o json | python3 -c $(quote "$worker_snapshot_script")")" +export BEFORE_JSON + +CURRENT_GATE='admin sync fixture' +if [[ -n "$ACTIVITY_CORE_ADMIN_SYNC_FIXTURE_COMMAND" ]]; then + log 'running operator-supplied fixture command' + cluster_bash "$ACTIVITY_CORE_ADMIN_SYNC_FIXTURE_COMMAND" + FIXTURE_STATUS=ran +elif [[ "$ACTIVITY_CORE_ADMIN_SYNC_REQUIRE_FIXTURE" == 1 ]]; then + echo 'ACTIVITY_CORE_ADMIN_SYNC_REQUIRE_FIXTURE=1 but no fixture command was supplied' >&2 + exit 2 +else + FIXTURE_STATUS=skipped +fi +export FIXTURE_STATUS + +CURRENT_GATE='POST /admin/sync' +log 'calling POST /admin/sync?definitions=true&schedules=true' +SYNC_RESPONSE_JSON="$( + cluster_bash "$(cat < {after['uid']}") +if before['restart_count'] != after['restart_count']: + raise SystemExit(f"worker restart count changed: {before['restart_count']} -> {after['restart_count']}") +PY +export AFTER_JSON + +CURRENT_GATE='State Hub evidence note' +log 'posting non-secret evidence note to State Hub' +EVIDENCE_NOTE_JSON="$(post_evidence passed '')" +trap - ERR +log 'verification passed' +printf '%s\n' "$EVIDENCE_NOTE_JSON" diff --git a/tools/cmd/railiance-deploy-activity-core-triage-robustness b/tools/cmd/railiance-deploy-activity-core-triage-robustness new file mode 100755 index 0000000..1ef6071 --- /dev/null +++ b/tools/cmd/railiance-deploy-activity-core-triage-robustness @@ -0,0 +1,263 @@ +#!/usr/bin/env bash +# Deploy ACTIVITY-WP-0016 code/schema/runtime together and prove daily-triage output. +set -euo pipefail + +NAMESPACE="${ACTIVITY_CORE_NAMESPACE:-activity-core}" +CLUSTER_HOST="${ACTIVITY_CORE_CLUSTER_HOST:-railiance01}" +STATE_HUB_URL="${STATE_HUB_URL:-http://127.0.0.1:8000}" +ACTIVITY_CORE_REPO="${ACTIVITY_CORE_REPO:-/home/worsch/activity-core}" +ACTIVITY_CORE_REMOTE_REPO="${ACTIVITY_CORE_REMOTE_REPO:-}" +ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL="${ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL:-0}" +ACTIVITY_CORE_SYNC_RUNTIME_BUNDLE="${ACTIVITY_CORE_SYNC_RUNTIME_BUNDLE:-auto}" +ACTIVITY_CORE_RESTART_DEPLOYMENTS="${ACTIVITY_CORE_RESTART_DEPLOYMENTS:-1}" +REQUIRED_ACTIVITY_CORE_REV="${REQUIRED_ACTIVITY_CORE_REV:-bf877b7}" +DAILY_TRIAGE_DEFINITION_SLUG="${DAILY_TRIAGE_DEFINITION_SLUG:-daily-statehub-wsjf-triage}" +STATE_HUB_PROGRESS_TIMEOUT_SECONDS="${STATE_HUB_PROGRESS_TIMEOUT_SECONDS:-240}" +STATE_HUB_PROGRESS_POLL_SECONDS="${STATE_HUB_PROGRESS_POLL_SECONDS:-5}" +EVIDENCE_WORKSTREAM_ID="${STATE_HUB_EVIDENCE_WORKSTREAM_ID:-7cbbe0d6-fea9-41c6-840c-46d0d8e8edde}" +EVIDENCE_TASK_ID="${STATE_HUB_EVIDENCE_TASK_ID:-8096621a-54ee-4be5-943e-5dc2da19ed28}" + +STARTED_AT="$(date -u +"%Y-%m-%dT%H:%M:%SZ")" +CURRENT_GATE=startup +REMOTE_REVISION="" +CONTRACT_JSON="" +API_IMAGE="" +API_IMAGE_ID="" +WORKER_IMAGE="" +WORKER_IMAGE_ID="" +SYNC_STATUS_JSON="" +TRIGGER_JSON="" +DEFINITION_ID="" +TRIGGER_KEY="" +EXPECTED_RUN_ID="" +PROGRESS_JSON="" + +export NAMESPACE CLUSTER_HOST STATE_HUB_URL ACTIVITY_CORE_REMOTE_REPO REQUIRED_ACTIVITY_CORE_REV +export DAILY_TRIAGE_DEFINITION_SLUG STARTED_AT EVIDENCE_WORKSTREAM_ID EVIDENCE_TASK_ID +export STATE_HUB_PROGRESS_TIMEOUT_SECONDS STATE_HUB_PROGRESS_POLL_SECONDS +export REMOTE_REVISION CONTRACT_JSON API_IMAGE API_IMAGE_ID WORKER_IMAGE WORKER_IMAGE_ID +export SYNC_STATUS_JSON TRIGGER_JSON DEFINITION_ID TRIGGER_KEY EXPECTED_RUN_ID PROGRESS_JSON + +log() { printf '[activity-core-triage-robustness] %s\n' "$*"; } +quote() { printf '%q' "$1"; } +cluster_bash() { if [[ -n "$CLUSTER_HOST" ]]; then ssh "$CLUSTER_HOST" "bash -s" <<<"$1"; else bash -s <<<"$1"; fi; } + +should_sync_runtime_bundle() { + case "$ACTIVITY_CORE_SYNC_RUNTIME_BUNDLE" in + 1|true|yes) return 0 ;; + 0|false|no) return 1 ;; + auto) [[ -n "$CLUSTER_HOST" && -d "$ACTIVITY_CORE_REPO/k8s/railiance" ]]; return ;; + *) printf 'invalid ACTIVITY_CORE_SYNC_RUNTIME_BUNDLE=%s\n' "$ACTIVITY_CORE_SYNC_RUNTIME_BUNDLE" >&2; exit 2 ;; + esac +} + +post_evidence() { + local status="$1" failing_gate="${2:-}" + export EVIDENCE_STATUS="$status" FAILING_GATE="$failing_gate" + python3 - <<'PY' +import json, os, sys, urllib.request + +def env_json(name): + raw = os.environ.get(name, "") + if not raw: + return None + try: + return json.loads(raw) + except json.JSONDecodeError: + return {"raw": raw} + +status = os.environ["EVIDENCE_STATUS"] +failing_gate = os.environ.get("FAILING_GATE") or None +detail = { + "producer": "railiance-cluster", + "verification": "activity-core WP-0016 coupled deploy and daily-triage smoke", + "status": status, + "failing_gate": failing_gate, + "cluster_host": os.environ.get("CLUSTER_HOST") or "local-kubectl", + "namespace": os.environ.get("NAMESPACE"), + "activity_core_repo": os.environ.get("ACTIVITY_CORE_REMOTE_REPO"), + "required_activity_core_revision": os.environ.get("REQUIRED_ACTIVITY_CORE_REV"), + "activity_core_revision": os.environ.get("REMOTE_REVISION") or None, + "runtime_bundle": "k8s/railiance/20-runtime.yaml", + "runtime_contract": env_json("CONTRACT_JSON"), + "sync_job": env_json("SYNC_STATUS_JSON"), + "api_image": os.environ.get("API_IMAGE") or None, + "api_image_id": os.environ.get("API_IMAGE_ID") or None, + "worker_image": os.environ.get("WORKER_IMAGE") or None, + "worker_image_id": os.environ.get("WORKER_IMAGE_ID") or None, + "definition_slug": os.environ.get("DAILY_TRIAGE_DEFINITION_SLUG"), + "definition_id": os.environ.get("DEFINITION_ID") or None, + "manual_trigger": env_json("TRIGGER_JSON"), + "expected_activity_core_run_id": os.environ.get("EXPECTED_RUN_ID") or None, + "state_hub_progress": env_json("PROGRESS_JSON"), + "started_at": os.environ.get("STARTED_AT"), +} +summary = ( + "Railiance activity-core WP-0016 deploy/smoke passed: code/schema and bounded runtime contract were reconciled together, daily triage was triggered, and State Hub recorded schema-valid output." + if status == "passed" + else "Railiance activity-core WP-0016 deploy/smoke failed" + (f" at {failing_gate}" if failing_gate else "") + "; see non-secret evidence detail." +) +payload = {"summary": summary, "event_type": "note", "author": "railiance-cluster", "detail": detail} +if os.environ.get("EVIDENCE_WORKSTREAM_ID"): + payload["workstream_id"] = os.environ["EVIDENCE_WORKSTREAM_ID"] +if os.environ.get("EVIDENCE_TASK_ID"): + payload["task_id"] = os.environ["EVIDENCE_TASK_ID"] +req = urllib.request.Request(os.environ["STATE_HUB_URL"].rstrip("/") + "/progress/", data=json.dumps(payload).encode(), headers={"Content-Type": "application/json"}, method="POST") +with urllib.request.urlopen(req, timeout=20) as resp: + sys.stdout.write(resp.read().decode()) +PY +} + +on_error() { local code=$?; trap - ERR; post_evidence failed "$CURRENT_GATE" >/dev/null || true; exit "$code"; } +trap on_error ERR + +if [[ "$CLUSTER_HOST" == local ]]; then + [[ "$ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL" == 1 ]] || { echo 'ACTIVITY_CORE_CLUSTER_HOST=local requires ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL=1' >&2; exit 2; } + CLUSTER_HOST="" +fi +if [[ -z "$ACTIVITY_CORE_REMOTE_REPO" ]]; then + if [[ -n "$CLUSTER_HOST" ]]; then ACTIVITY_CORE_REMOTE_REPO="$(ssh "$CLUSTER_HOST" pwd)/activity-core"; else ACTIVITY_CORE_REMOTE_REPO="$ACTIVITY_CORE_REPO"; fi +fi +export CLUSTER_HOST ACTIVITY_CORE_REMOTE_REPO + +CURRENT_GATE='cluster executor preflight' +log "using cluster executor: ${CLUSTER_HOST:-local kubectl}" +cluster_bash 'set -euo pipefail; command -v kubectl >/dev/null; command -v python3 >/dev/null' + +CURRENT_GATE='runtime bundle sync' +if should_sync_runtime_bundle; then + log "syncing runtime bundle to ${CLUSTER_HOST}:${ACTIVITY_CORE_REMOTE_REPO}/k8s/railiance" + ssh "$CLUSTER_HOST" "mkdir -p $(quote "$ACTIVITY_CORE_REMOTE_REPO")/k8s/railiance" + rsync -a --delete "$ACTIVITY_CORE_REPO/k8s/railiance/" "${CLUSTER_HOST}:${ACTIVITY_CORE_REMOTE_REPO}/k8s/railiance/" +fi + +CURRENT_GATE='activity-core revision gate' +REMOTE_REVISION="$(cluster_bash "set -euo pipefail; git -C $(quote "$ACTIVITY_CORE_REMOTE_REPO") rev-parse --short HEAD; git -C $(quote "$ACTIVITY_CORE_REMOTE_REPO") merge-base --is-ancestor $(quote "$REQUIRED_ACTIVITY_CORE_REV") HEAD")" +export REMOTE_REVISION + +CURRENT_GATE='runtime contract gate' +CONTRACT_JSON="$( + cluster_bash "$(cat <= 1800), +} +missing = [name for name, ok in checks.items() if not ok] +print(json.dumps({'path': sys.argv[1], 'max_tokens': max_tokens, 'checks': checks, 'missing': missing}, sort_keys=True)) +if missing: + raise SystemExit('runtime bundle contract checks failed: ' + ', '.join(missing)) +PY +EOF +)" +)" +export CONTRACT_JSON + +CURRENT_GATE='runtime bundle reconcile' +log 'applying runtime bundle and restarting activity-core deployments' +cluster_bash "set -euo pipefail +kubectl apply -f $(quote "$ACTIVITY_CORE_REMOTE_REPO")/k8s/railiance/00-namespace.yaml +kubectl -n $(quote "$NAMESPACE") delete job actcore-migrate actcore-sync --ignore-not-found +kubectl apply -f $(quote "$ACTIVITY_CORE_REMOTE_REPO")/k8s/railiance/20-runtime.yaml +if [[ $(quote "$ACTIVITY_CORE_RESTART_DEPLOYMENTS") == 1 ]]; then kubectl -n $(quote "$NAMESPACE") rollout restart deploy/actcore-api deploy/actcore-worker deploy/actcore-event-router; fi +kubectl -n $(quote "$NAMESPACE") wait --for=condition=complete job/actcore-migrate --timeout=180s +kubectl -n $(quote "$NAMESPACE") rollout status deploy/actcore-api --timeout=180s +kubectl -n $(quote "$NAMESPACE") rollout status deploy/actcore-worker --timeout=180s +kubectl -n $(quote "$NAMESPACE") rollout status deploy/actcore-event-router --timeout=180s +kubectl -n $(quote "$NAMESPACE") wait --for=condition=complete job/actcore-sync --timeout=180s" + +CURRENT_GATE='runtime status capture' +API_IMAGE="$(cluster_bash "kubectl -n $(quote "$NAMESPACE") get deploy actcore-api -o jsonpath='{.spec.template.spec.containers[0].image}'")" +API_IMAGE_ID="$(cluster_bash "kubectl -n $(quote "$NAMESPACE") get pod -l app.kubernetes.io/name=actcore-api -o jsonpath='{.items[0].status.containerStatuses[0].imageID}'")" +WORKER_IMAGE="$(cluster_bash "kubectl -n $(quote "$NAMESPACE") get deploy actcore-worker -o jsonpath='{.spec.template.spec.containers[0].image}'")" +WORKER_IMAGE_ID="$(cluster_bash "kubectl -n $(quote "$NAMESPACE") get pod -l app.kubernetes.io/name=actcore-worker -o jsonpath='{.items[0].status.containerStatuses[0].imageID}'")" +SYNC_STATUS_JSON="$(cluster_bash "kubectl -n $(quote "$NAMESPACE") get job actcore-sync -o json" | python3 -c 'import json,sys; j=json.load(sys.stdin); s=j.get("status",{}); print(json.dumps({"name":j["metadata"]["name"],"succeeded":s.get("succeeded",0),"failed":s.get("failed",0),"completion_time":s.get("completionTime")}))')" +export API_IMAGE API_IMAGE_ID WORKER_IMAGE WORKER_IMAGE_ID SYNC_STATUS_JSON + +CURRENT_GATE='daily-triage manual trigger' +log "triggering ${DAILY_TRIAGE_DEFINITION_SLUG}" +TRIGGER_JSON="$( + cluster_bash "$(cat <0` before posting pass evidence. \ No newline at end of file diff --git a/workplans/RAIL-BS-WP-0009-activity-core-admin-sync-smoke.md b/workplans/RAIL-BS-WP-0009-activity-core-admin-sync-smoke.md index fc64375..0b6c029 100644 --- a/workplans/RAIL-BS-WP-0009-activity-core-admin-sync-smoke.md +++ b/workplans/RAIL-BS-WP-0009-activity-core-admin-sync-smoke.md @@ -8,7 +8,7 @@ status: active owner: railiance-cluster topic_slug: railiance created: "2026-07-01" -updated: "2026-07-01" +updated: "2026-07-02" state_hub_workstream_id: "2c9e8e96-ec6a-433c-9e6d-0efbcd18679e" --- @@ -46,3 +46,12 @@ After RAIL-BS-WP-0008-T01 is deployed, without restarting the worker: 5. Record non-secret evidence in the State Hub. Response JSON should include `definitions.synced`, `schedules.upserted`, `schedules.paused`, `schedules.deleted_orphans`, and `errors[]`. + +2026-07-02: Added `make admin-sync-smoke` / `bin/railiance admin-sync-smoke` +as the repeatable operator path. It captures the worker pod UID/restart count, +optionally runs an operator-supplied enabled-flip/rename fixture via +`ACTIVITY_CORE_ADMIN_SYNC_FIXTURE_COMMAND`, calls +`POST /admin/sync?definitions=true&schedules=true`, verifies the expected +response counters and empty `errors[]`, rechecks that the same worker pod did +not restart, and posts non-secret State Hub evidence. T01 stays `wait` until +RAIL-BS-WP-0008-T01 is deployed and the smoke is run on railiance01. \ No newline at end of file From 037a71f3557fb06530a8aa5270eacb1454411182 Mon Sep 17 00:00:00 2001 From: tegwick Date: Thu, 2 Jul 2026 10:47:40 +0200 Subject: [PATCH 06/10] =?UTF-8?q?RAIL-BS-WP-0008:=20T01/T02=20progress=20?= =?UTF-8?q?=E2=80=94=20image=20rebuilt,=20contract=20fixed,=20deploy=20ope?= =?UTF-8?q?rator-gated?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Fable 5 --- ...-activity-core-wp0016-triage-output-deploy.md | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/workplans/RAIL-BS-WP-0008-activity-core-wp0016-triage-output-deploy.md b/workplans/RAIL-BS-WP-0008-activity-core-wp0016-triage-output-deploy.md index 45c0a60..91659d7 100644 --- a/workplans/RAIL-BS-WP-0008-activity-core-wp0016-triage-output-deploy.md +++ b/workplans/RAIL-BS-WP-0008-activity-core-wp0016-triage-output-deploy.md @@ -32,7 +32,7 @@ whole-doc validator. It MUST ship together with the new `executor.py` ```task id: RAIL-BS-WP-0008-T01 -status: todo +status: progress priority: high state_hub_task_id: "079e39a9-f938-4d03-a5bc-4d3d2f7b1d83" ``` @@ -48,11 +48,23 @@ runtime bundle contract before applying it, restarts the activity-core deployments by default, waits for migrate/sync jobs and rollouts, then records non-secret State Hub evidence. Live execution on railiance01 remains pending. +2026-07-02 (later session): rebuilt `activity-core:railiance01-prod` locally +from activity-core main `7612112` (includes `bf877b7` and the T02 prompt +contract). Transfer/import to railiance01 was **blocked by the agent +permission policy** (production remote write requires explicit operator +authorization). Two preconditions found and fixed/noted: (a) the remote +`~/activity-core` copy has no `.git`, so the script's revision gate will fail +until the repo is synced with git metadata or `REQUIRED_ACTIVITY_CORE_REV` +verification is adapted; (b) the T02 runtime contract is now satisfied in the +repo bundle (activity-core commit `7612112`). Operator pickup: run the +image save/scp/import from the deploy README, sync the repo with `.git`, then +`make deploy-activity-core-triage-robustness`. + ## Update daily-statehub-wsjf-triage runtime-bundle Instruction ```task id: RAIL-BS-WP-0008-T02 -status: todo +status: progress priority: high state_hub_task_id: "129fb472-41e8-4e5c-bcbb-0995a96e223b" ``` From d10741fb0d4bfb691bdecfaf534490281b3effc2 Mon Sep 17 00:00:00 2001 From: tegwick Date: Thu, 2 Jul 2026 10:48:15 +0200 Subject: [PATCH 07/10] chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-07-02: - update .custodian-brief.md for railiance-cluster --- .custodian-brief.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.custodian-brief.md b/.custodian-brief.md index 01e3722..6173dcf 100644 --- a/.custodian-brief.md +++ b/.custodian-brief.md @@ -2,7 +2,7 @@ # Custodian Brief — railiance-cluster **Domain:** financials -**Last synced:** 2026-07-01 22:27 UTC +**Last synced:** 2026-07-02 08:48 UTC **State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)* ## Current Goal @@ -15,8 +15,8 @@ Install k3s and Kubernetes Baseline on the HostEurope Server Progress: 0/4 done | workstream_id: `7cbbe0d6-fea9-41c6-840c-46d0d8e8edde` **Open tasks:** -- · Deploy activity-core with coupled schema and executor `079e39a9` -- · Update daily-statehub-wsjf-triage runtime-bundle Instruction `129fb472` +- ► Deploy activity-core with coupled schema and executor `079e39a9` +- ► Update daily-statehub-wsjf-triage runtime-bundle Instruction `129fb472` - · Pull raw llm-connect response for the 2026-06-26 run `59559f1d` - · Acceptance smoke `8096621a` From c398bf502743fea06fd8e53756c51bc25449635d Mon Sep 17 00:00:00 2001 From: tegwick Date: Thu, 2 Jul 2026 11:53:11 +0200 Subject: [PATCH 08/10] RAIL-BS-WP-0008/0009 finished: live deploy, top-7 proof, admin-sync smoke Co-Authored-By: Claude Fable 5 --- ...tivity-core-wp0016-triage-output-deploy.md | 40 ++++++++++++++++--- ...-WP-0009-activity-core-admin-sync-smoke.md | 19 +++++++-- 2 files changed, 50 insertions(+), 9 deletions(-) diff --git a/workplans/RAIL-BS-WP-0008-activity-core-wp0016-triage-output-deploy.md b/workplans/RAIL-BS-WP-0008-activity-core-wp0016-triage-output-deploy.md index 91659d7..43139e0 100644 --- a/workplans/RAIL-BS-WP-0008-activity-core-wp0016-triage-output-deploy.md +++ b/workplans/RAIL-BS-WP-0008-activity-core-wp0016-triage-output-deploy.md @@ -4,7 +4,7 @@ type: workplan title: "activity-core WP-0016 triage-output robustness deploy" domain: financials repo: railiance-cluster -status: active +status: finished owner: railiance-cluster topic_slug: railiance created: "2026-07-01" @@ -32,7 +32,7 @@ whole-doc validator. It MUST ship together with the new `executor.py` ```task id: RAIL-BS-WP-0008-T01 -status: progress +status: done priority: high state_hub_task_id: "079e39a9-f938-4d03-a5bc-4d3d2f7b1d83" ``` @@ -64,7 +64,7 @@ image save/scp/import from the deploy README, sync the repo with `.git`, then ```task id: RAIL-BS-WP-0008-T02 -status: progress +status: done priority: high state_hub_task_id: "129fb472-41e8-4e5c-bcbb-0995a96e223b" ``` @@ -89,7 +89,7 @@ or one-object-per-line framing, and `max_tokens` headroom of at least 1800. ```task id: RAIL-BS-WP-0008-T03 -status: todo +status: cancel priority: medium state_hub_task_id: "59559f1d-821f-4660-8a7d-c623c6631864" ``` @@ -104,7 +104,7 @@ secrets. ```task id: RAIL-BS-WP-0008-T04 -status: todo +status: done priority: high state_hub_task_id: "8096621a-54ee-4be5-943e-5dc2da19ed28" ``` @@ -120,4 +120,32 @@ WP-0006-T03. 2026-07-02: The deploy command now triggers the daily-triage definition after reconcile and polls State Hub for a post-trigger `daily_triage` event with `output_validated=true`. If the run is partial, it also requires -`quarantined_count>0` before posting pass evidence. \ No newline at end of file +`quarantined_count>0` before posting pass evidence. + +## Completion 2026-07-02 + +Deployed live with operator authorization. Image `activity-core:railiance01-prod` +rebuilt from main `7612112`, imported into railiance01 k3s +(`sha256:550c5592...`), repo synced with git metadata, and +`make deploy-activity-core-triage-robustness` applied the coupled +schema/executor bundle with all rollouts and migrate/sync jobs green. + +- T01/T02 done: revision gate and runtime contract gate both passed + (`bounded_top_7`, `ndjson_or_line_framing`, `fewer_well_formed`, + `max_tokens_headroom` >= 1800 all true). +- T04 done: manually triggered daily-triage run produced a clean schema-valid + report — State Hub event `24d2d321-c761-47f7-bf9e-7950a6253c21` + (2026-07-02T09:50:44Z) with `output_validated=true`, exactly 7 ranked + recommendations, `working_memory_status=written`, no validation error. The + bounded top-7 contract is proven live; the three-clean-run streak for + ACTIVITY-WP-0010-T04 / WP-0006-T03 restarts from this run. +- T03 cancelled: the raw 2026-06-26 llm-connect response is unrecoverable — + the llm-connect pod is stateless (no volumes, no response store) and its + log stream contains only 2 startup lines from 2026-06-19. Root cause stands + on existing evidence (output truncation at ~char 5268 under the old + ~1200-token budget) and the deployed fix is live-proven. +- Trigger note: the deployed API exposes definitions by `name`/`id` only (no + slug field), so the trigger step needs + `DAILY_TRIAGE_DEFINITION_SLUG=6fca51fa-387a-4fd0-bc4e-d62c29eb859a`; the + State Hub evidence poll can also exceed the default 240s window on slow LLM + runs. diff --git a/workplans/RAIL-BS-WP-0009-activity-core-admin-sync-smoke.md b/workplans/RAIL-BS-WP-0009-activity-core-admin-sync-smoke.md index 0b6c029..ea669bd 100644 --- a/workplans/RAIL-BS-WP-0009-activity-core-admin-sync-smoke.md +++ b/workplans/RAIL-BS-WP-0009-activity-core-admin-sync-smoke.md @@ -4,7 +4,7 @@ type: workplan title: "activity-core no-restart admin-sync smoke (ACTIVITY-WP-0012-T05)" domain: financials repo: railiance-cluster -status: active +status: finished owner: railiance-cluster topic_slug: railiance created: "2026-07-01" @@ -30,7 +30,7 @@ The deploy precondition is covered by RAIL-BS-WP-0008-T01 (main at ```task id: RAIL-BS-WP-0009-T01 -status: wait +status: done priority: medium state_hub_task_id: "60f3387d-3d14-42a9-b8a3-725a86468510" ``` @@ -54,4 +54,17 @@ optionally runs an operator-supplied enabled-flip/rename fixture via `POST /admin/sync?definitions=true&schedules=true`, verifies the expected response counters and empty `errors[]`, rechecks that the same worker pod did not restart, and posts non-secret State Hub evidence. T01 stays `wait` until -RAIL-BS-WP-0008-T01 is deployed and the smoke is run on railiance01. \ No newline at end of file +RAIL-BS-WP-0008-T01 is deployed and the smoke is run on railiance01. + +## Completion 2026-07-02 + +`make admin-sync-smoke` passed against the freshly deployed +RAIL-BS-WP-0008 runtime: `POST /admin/sync?definitions=true&schedules=true` +returned `ok=true` with `definitions.synced=6`, `schedules.upserted=4`, +`schedules.paused=2`, `deleted_orphans=0`, empty `errors[]`, and the worker +pod identity (`actcore-worker-5b78f85b76-ng54t`, restart_count 0) was +unchanged before and after — proving no-restart admin sync. Non-secret +evidence: State Hub event `4caa288d-830b-4348-9cff-b2d5855cd42d`. The +optional enabled-flip fixture was skipped (no operator fixture supplied); +schedule pause/upsert semantics were exercised by the sync counters. Closes +ACTIVITY-WP-0012-T05. From 98b6618dbc2a15605f00a7901bf3772e8a91ff51 Mon Sep 17 00:00:00 2001 From: tegwick Date: Thu, 2 Jul 2026 11:53:28 +0200 Subject: [PATCH 09/10] chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-07-02: - update .custodian-brief.md for railiance-cluster --- .custodian-brief.md | 17 +---------------- 1 file changed, 1 insertion(+), 16 deletions(-) diff --git a/.custodian-brief.md b/.custodian-brief.md index 6173dcf..eae8eec 100644 --- a/.custodian-brief.md +++ b/.custodian-brief.md @@ -2,7 +2,7 @@ # Custodian Brief — railiance-cluster **Domain:** financials -**Last synced:** 2026-07-02 08:48 UTC +**Last synced:** 2026-07-02 09:53 UTC **State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)* ## Current Goal @@ -11,21 +11,6 @@ Install k3s and Kubernetes Baseline on the HostEurope Server ## Active Workstreams -### activity-core WP-0016 triage-output robustness deploy -Progress: 0/4 done | workstream_id: `7cbbe0d6-fea9-41c6-840c-46d0d8e8edde` - -**Open tasks:** -- ► Deploy activity-core with coupled schema and executor `079e39a9` -- ► Update daily-statehub-wsjf-triage runtime-bundle Instruction `129fb472` -- · Pull raw llm-connect response for the 2026-06-26 run `59559f1d` -- · Acceptance smoke `8096621a` - -### activity-core no-restart admin-sync smoke (ACTIVITY-WP-0012-T05) -Progress: 0/1 done | workstream_id: `2c9e8e96-ec6a-433c-9e6d-0efbcd18679e` - -**Open tasks:** -- ! Run the no-restart admin-sync smoke `60f3387d` - ### ThreePhoenix - HA Cluster Implementation Progress: 0/7 done | workstream_id: `9e208376-23f1-40c7-9813-fac1f7d6ad3b` From c65e56acf105cc83c677e2f3e92347a6eb168a67 Mon Sep 17 00:00:00 2001 From: tegwick Date: Sat, 4 Jul 2026 12:50:02 +0200 Subject: [PATCH 10/10] Add Forgejo CI smoke workflow (enablement template) --- .forgejo/workflows/ci-smoke.yaml | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) create mode 100644 .forgejo/workflows/ci-smoke.yaml diff --git a/.forgejo/workflows/ci-smoke.yaml b/.forgejo/workflows/ci-smoke.yaml new file mode 100644 index 0000000..bd44c56 --- /dev/null +++ b/.forgejo/workflows/ci-smoke.yaml @@ -0,0 +1,29 @@ +# Canonical CI smoke template (tier 1 routing drill). +# Copy to: .forgejo/workflows/ci-smoke.yaml in consumer repos. +name: CI Smoke + +on: + push: + branches: + - main + workflow_dispatch: + +jobs: + host-smoke: + runs-on: self-hosted + steps: + - name: Routing probe (host runner) + run: | + set -eu + echo "repository=${GITHUB_REPOSITORY:-unknown}" + echo "sha=${GITHUB_SHA:-unknown}" + echo "runner=${RUNNER_NAME:-unknown}" + uname -a + + container-smoke: + runs-on: ubuntu-latest + steps: + - name: Routing probe (container label) + run: | + set -eu + echo "container-smoke ok for ${GITHUB_REPOSITORY:-unknown}" \ No newline at end of file