Compare commits

...

10 commits

Author SHA1 Message Date
c65e56acf1 Add Forgejo CI smoke workflow (enablement template)
All checks were successful
CI Smoke / host-smoke (push) Successful in 0s
CI Smoke / container-smoke (push) Successful in 1s
2026-07-04 12:50:02 +02:00
98b6618dbc chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-07-02:
  - update .custodian-brief.md for railiance-cluster
2026-07-02 11:53:28 +02:00
c398bf5027 RAIL-BS-WP-0008/0009 finished: live deploy, top-7 proof, admin-sync smoke
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 11:53:11 +02:00
d10741fb0d chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-07-02:
  - update .custodian-brief.md for railiance-cluster
2026-07-02 10:48:15 +02:00
037a71f355 RAIL-BS-WP-0008: T01/T02 progress — image rebuilt, contract fixed, deploy operator-gated
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 10:47:40 +02:00
9c55dfb02a RAIL-BS-WP-0008/0009: operator deploy + admin-sync smoke tooling
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 10:44:06 +02:00
84c005254d Regenerate agent instructions: workstream -> workplan terminology
Registration guidance now prescribes file-first + fix-consistency (C-06)
instead of manual create_workplan/create_workstream calls; progress-event
examples use workplan_id; legacy field names annotated.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 01:47:45 +02:00
5ac713641d chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-07-02:
  - update .custodian-brief.md for railiance-cluster
2026-07-02 00:27:26 +02:00
adb758b6d6 chore(consistency): commit workplan task-id writeback
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 00:26:45 +02:00
c3a95e93b4 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-07-02:
  - update .custodian-brief.md for railiance-cluster
2026-07-02 00:09:48 +02:00
14 changed files with 622 additions and 81 deletions

View file

@ -20,7 +20,7 @@ Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run wa
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=railiance-cluster` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workplans; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table

View file

@ -1,6 +1,6 @@
## First Session Protocol
Triggered when `get_domain_summary("financials")` shows **no workstreams**.
Triggered when `get_domain_summary("financials")` shows **no workplans**.
The project is registered but work has not yet been structured.
**Step 1 — Read, don't write**
@ -11,27 +11,31 @@ The project is registered but work has not yet been structured.
**Step 2 — Survey in-progress work**
Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete.
**Step 3 — Propose workstreams to Bernd**
Propose 13 workstreams — each a coherent strand, weeks to months, anchored to a
**Step 3 — Propose workplans to Bernd**
Propose 13 workplans — each a coherent strand, weeks to months, anchored to a
roadmap phase. **Wait for approval before creating.**
**Step 4 — Create workplan file first, then DB record (ADR-001)**
**Step 4 — Write the workplan file; fix-consistency registers it (ADR-001)**
```
workplans/RAIL-BS-WP-NNNN-<slug>.md ← write this first
workplans/RAIL-BS-WP-NNNN-<slug>.md ← write this, commit it
```
Then register in the hub:
```
create_workstream(topic_id="ca369340-a64e-442e-98f1-a4fa7dc74a38", title="...", owner="...", description="...")
create_task(workstream_id="<id>", title="...", priority="high|medium|low")
Then register by running the consistency check — do **not** call
`create_workplan`/`create_task` (or legacy `create_workstream`) yourself;
manual registration duplicates what C-06 creates from the file:
```bash
statehub fix-consistency --repo railiance-cluster
```
C-06 creates the hub workplan + tasks and writes `state_hub_workstream_id` /
`state_hub_task_id` back into the file (legacy field names, kept for
compatibility — they hold workplan/task IDs).
**Step 5 — Record the setup**
```
add_progress_event(
summary="First session: structured financials into N workstreams, M tasks",
summary="First session: structured financials into N workplans, M tasks",
event_type="milestone",
topic_id="ca369340-a64e-442e-98f1-a4fa7dc74a38",
detail={"workstreams": [...], "tasks_created": M}
detail={"workplans": [...], "tasks_created": M}
)
```

View file

@ -44,7 +44,7 @@ For each file with `status: ready`, `active`, or `blocked`, note pending
**Step 4 — Present brief**
1. **Active workstreams** for `financials` — title, task counts, blocking decisions
1. **Active workplans** for `financials` — title, task counts, blocking decisions
2. **Pending tasks** from `workplans/` + any `[repo:railiance-cluster]` hub tasks
3. **Goal guidance** — if `goal_guidance` in summary:
- `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
@ -52,33 +52,42 @@ For each file with `status: ready`, `active`, or `blocked`, note pending
4. **Suggested next action** — highest-priority open item
5. **SBOM status** — flag if `last_sbom_at` is unset for this repo
If no workstreams: follow First Session Protocol (`first-session.md`).
If no workplans: follow First Session Protocol (`first-session.md`).
**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()`
> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`)
> are First Session Protocol only. Work structure belongs in repo files (ADR-001).
> State Hub is a *read model*. **Never register workplans or tasks by hand**
> (`create_workplan`, `create_task`, or the legacy `create_workstream`) — write
> the workplan file in `workplans/` and run `fix-consistency`; its C-06 check
> registers the workplan and its tasks in the hub and writes the IDs back into
> the file. Manual registration creates duplicates the moment fix-consistency
> runs. Work structure belongs in repo files (ADR-001).
>
> Terminology: "workstream" is the legacy name for workplan. Some API/frontmatter
> field names keep it for compatibility (`state_hub_workstream_id`,
> `workstream_id` params) — treat them as workplan IDs.
**Session close:**
With MCP tools:
```
add_progress_event(summary="...", topic_id="ca369340-a64e-442e-98f1-a4fa7dc74a38", workstream_id="<uuid>")
add_progress_event(summary="...", topic_id="ca369340-a64e-442e-98f1-a4fa7dc74a38", workplan_id="<uuid>")
```
Without MCP tools:
```bash
curl -s -X POST http://127.0.0.1:8000/progress/ \
-H "Content-Type: application/json" \
-d '{"topic_id":"ca369340-a64e-442e-98f1-a4fa7dc74a38","workstream_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}'
-d '{"topic_id":"ca369340-a64e-442e-98f1-a4fa7dc74a38","workplan_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}'
```
If workplan files were modified, ensure the local copy is up to date first:
If workplan files were modified, ensure the local copy is up to date first,
then sync from the repo checkout:
```bash
git -C <repo_path> pull --ff-only
cd ~/state-hub && make fix-consistency REPO=railiance-cluster
git pull --ff-only
statehub fix-consistency
```
For repos where implementation runs on a remote machine (e.g. CoulombCore),
use the combined target which pulls before fixing:
use the pull-before-fix mode from any shell with the State Hub CLI:
```bash
cd ~/state-hub && make fix-consistency-remote REPO=railiance-cluster
statehub fix-consistency --repo railiance-cluster --remote
```
**C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback
will sync the file to match DB. **C-16** (repo behind remote) blocks all writes

View file

@ -5,7 +5,7 @@ ID prefix: `RAIL-BS-WP-`
Work items originate as files in this repo **before** being registered in the hub.
Canonical workplan/workstream frontmatter statuses are:
Canonical workplan frontmatter statuses are:
`proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`.
Use `proposed` for a newly drafted plan, `ready` after review against current
repo state, and `finished` when implementation is complete. `stalled` and
@ -16,14 +16,15 @@ prefix: `YYMMDD-RAIL-BS-WP-NNNN-<slug>.md`. The frontmatter id remains
unchanged; the prefix is only for quick visual reference.
Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids
`workplans/ADHOC-YYYY-MM-DD.md`, workplan slug `adhoc-YYYY-MM-DD`, and task ids
`ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed
directly. Promote anything requiring analysis, design, approval, dependencies, or
multiple planned phases into a normal workplan.
Ecosystem todos from other agents arrive as `[repo:railiance-cluster]` hub tasks —
visible at session start. Pick one up by creating the workplan file, then registering
the workstream.
visible at session start. Pick one up by creating the workplan file, committing,
and running `statehub fix-consistency` — C-06 registers the workplan in the hub.
Never register by hand with `create_workplan`/`create_workstream`.
Task blocks use this shape:
@ -37,4 +38,8 @@ state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
Status progression is `todo``progress``done`; use `wait` for waiting or
blocked work and `cancel` for stopped work.
Workplan frontmatter carries `state_hub_workstream_id` — a legacy field name
kept for compatibility ("workstream" is the old term for workplan); it holds
the hub workplan id and is written by fix-consistency. Do not edit or rename it.
<!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->

View file

@ -2,7 +2,7 @@
# Custodian Brief — railiance-cluster
**Domain:** financials
**Last synced:** 2026-07-01 22:04 UTC
**Last synced:** 2026-07-02 09:53 UTC
**State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
## Current Goal
@ -11,39 +11,6 @@ Install k3s and Kubernetes Baseline on the HostEurope Server
## Active Workstreams
### activity-core no-restart admin-sync smoke (ACTIVITY-WP-0012-T05)
Progress: 0/1 done | workstream_id: `2c9e8e96-ec6a-433c-9e6d-0efbcd18679e`
**Open tasks:**
- ! Run the no-restart admin-sync smoke `60f3387d`
### activity-core WP-0016 triage-output robustness deploy
Progress: 0/4 done | workstream_id: `7cbbe0d6-fea9-41c6-840c-46d0d8e8edde`
**Open tasks:**
- · Deploy activity-core with coupled schema and executor `079e39a9`
- · Update daily-statehub-wsjf-triage runtime-bundle Instruction `129fb472`
- · Pull raw llm-connect response for the 2026-06-26 run `59559f1d`
- · Acceptance smoke `8096621a`
### activity-core WP-0016 triage-output robustness deploy
Progress: no tasks done | workstream_id: `5032c55c-2ee2-4b7e-b1eb-157f0f8ac647`
### activity-core WP-0016 triage-output robustness deploy
Progress: 0/4 done | workstream_id: `f2ca1a5d-4dd6-42ea-8003-969c7265f891`
**Open tasks:**
- · Update daily-statehub-wsjf-triage runtime-bundle Instruction (RAIL-BS-WP-0008-T02) `2338d061`
- · Deploy activity-core with coupled schema and executor (RAIL-BS-WP-0008-T01) `1ea0945a`
- · Pull raw llm-connect response for 2026-06-26 run (RAIL-BS-WP-0008-T03) `b799917b`
- · Acceptance smoke: daily-triage clean or graceful degrade (RAIL-BS-WP-0008-T04) `e267a366`
### activity-core no-restart admin-sync smoke (ACTIVITY-WP-0012-T05)
Progress: 0/1 done | workstream_id: `366eec46-3139-4810-ace6-ea75750fe821`
**Open tasks:**
- · Run no-restart admin-sync smoke with Temporal schedule verification (RAIL-BS-WP-0009-T01) `ffe665ce`
### ThreePhoenix - HA Cluster Implementation
Progress: 0/7 done | workstream_id: `9e208376-23f1-40c7-9813-fac1f7d6ad3b`

View file

@ -0,0 +1,29 @@
# Canonical CI smoke template (tier 1 routing drill).
# Copy to: .forgejo/workflows/ci-smoke.yaml in consumer repos.
name: CI Smoke
on:
push:
branches:
- main
workflow_dispatch:
jobs:
host-smoke:
runs-on: self-hosted
steps:
- name: Routing probe (host runner)
run: |
set -eu
echo "repository=${GITHUB_REPOSITORY:-unknown}"
echo "sha=${GITHUB_SHA:-unknown}"
echo "runner=${RUNNER_NAME:-unknown}"
uname -a
container-smoke:
runs-on: ubuntu-latest
steps:
- name: Routing probe (container label)
run: |
set -eu
echo "container-smoke ok for ${GITHUB_REPOSITORY:-unknown}"

View file

@ -20,6 +20,12 @@ there is no MCP server for Codex agents.
|---------|-----|
| Local workstation | `http://127.0.0.1:8000` |
| Remote via tunnel | `http://127.0.0.1:18000` |
| Optional local edge relay | http://127.0.0.1:18080 |
When an operator has enabled the edge relay, set API_BASE to the relay URL.
Queueable writes return an explicit queued receipt if the central hub is
unreachable. Treat that as pending local evidence, then ask the operator to run
statehub outbox status/replay after connectivity returns.
### Orient at session start
@ -27,8 +33,8 @@ there is no MCP server for Codex agents.
# Offline brief — works without hub connection
cat .custodian-brief.md
# Active workstreams for this domain
curl -s "http://127.0.0.1:8000/workstreams/?topic_id=ca369340-a64e-442e-98f1-a4fa7dc74a38&status=active" \
# Active workplans for this domain
curl -s "http://127.0.0.1:8000/workplans/?topic_id=ca369340-a64e-442e-98f1-a4fa7dc74a38&status=active" \
| python3 -m json.tool
# Check inbox
@ -51,12 +57,12 @@ curl -s -X POST http://127.0.0.1:8000/progress/ \
"summary": "what was done",
"event_type": "note",
"author": "codex",
"workstream_id": "<uuid>",
"workplan_id": "<uuid>",
"task_id": "<uuid>"
}'
```
Omit `workstream_id` / `task_id` when not applicable.
Omit `workplan_id` / `task_id` when not applicable.
### Update task status
@ -80,7 +86,7 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
## Session Protocol
**Start:**
1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe)
1. `cat .custodian-brief.md` — domain goal and open workplans (offline-safe)
2. Check inbox: `GET /messages/?to_agent=railiance-cluster&unread_only=true`; mark read
3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks
4. Check human-needed tasks: `GET /tasks/?needs_human=true`
@ -92,12 +98,12 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
**Close:**
1. Update workplan file task statuses to reflect progress
2. Log: `POST /progress/` with a summary of what changed
3. Note for the custodian operator: after workplan file changes, run from
`~/state-hub`:
3. After workplan file changes, run:
```bash
make fix-consistency REPO=railiance-cluster
statehub fix-consistency
```
This syncs task status from files into the hub DB.
Coding agents should run this directly; ask the operator only if the CLI or
State Hub API is unavailable. This syncs task status from files into the hub DB.
---
@ -123,7 +129,7 @@ Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run wa
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=railiance-cluster` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workplans; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table

View file

@ -30,6 +30,12 @@ verify-activity-core: ## Reconcile activity-core runtime and verify disabled ops
reconcile-activity-core-llm-connect: ## Reconcile activity-core llm-connect URL and run non-secret gate checks
tools/cmd/railiance-reconcile-activity-core-llm-connect
deploy-activity-core-triage-robustness: ## Deploy ACTIVITY-WP-0016 bundle and prove daily-triage output validation
tools/cmd/railiance-deploy-activity-core-triage-robustness
admin-sync-smoke: ## Run activity-core no-restart POST /admin/sync smoke
tools/cmd/railiance-admin-sync-smoke
##@ Help
help: ## Show this help
@ -37,4 +43,4 @@ help: ## Show this help
/^[a-zA-Z_-]+:.*?##/ { printf " \033[36m%-20s\033[0m %s\n", $$1, $$2 } \
/^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) }' $(MAKEFILE_LIST)
.PHONY: backup restore preflight k3s-install smoke test-ha-failover verify-activity-core reconcile-activity-core-llm-connect help
.PHONY: backup restore preflight k3s-install smoke test-ha-failover verify-activity-core reconcile-activity-core-llm-connect deploy-activity-core-triage-robustness admin-sync-smoke help

View file

@ -22,6 +22,10 @@ Commands:
observe Plan/run Stage 2 observation checks
promote Plan/apply Stage 3 stable promotion
rollback Plan/apply rollback to previous stable
deploy-triage-robustness
Deploy ACTIVITY-WP-0016 and prove daily-triage validation
admin-sync-smoke
Run activity-core no-restart POST /admin/sync smoke
build-spore Build a distributable "Spore" bundle
seed-local Run the seed script on this machine
checklist Pre-VM checklist
@ -51,6 +55,8 @@ case "$cmd" in
observe) exec railiance-stage2 observe "$@" ;;
promote) exec railiance-stage3 promote "$@" ;;
rollback) exec railiance-stage3 rollback "$@" ;;
deploy-triage-robustness) exec railiance-deploy-activity-core-triage-robustness "$@" ;;
admin-sync-smoke) exec railiance-admin-sync-smoke "$@" ;;
build-spore) bash "$ROOT/tools/build_spore.sh" ;;
seed-local) bash "$ROOT/tools/seed_node.sh" ;;
checklist)

View file

@ -21,6 +21,8 @@ mode are denied these by the permission classifier — that is intentional.
| `make test-ha-failover` | kills the primary PG pod to assert recovery |
| `make verify-activity-core` | reconciles activity-core runtime on railiance01 |
| `make reconcile-activity-core-llm-connect` | patches ConfigMap, applies llm-connect overlay, runs smoke pod |
| `make deploy-activity-core-triage-robustness` | deploys ACTIVITY-WP-0016 code/schema/runtime as a coupled bundle and triggers daily triage |
| `make admin-sync-smoke` | calls activity-core `POST /admin/sync` and proves worker pod identity/restart count did not change |
## Read-only / safe targets
@ -33,3 +35,8 @@ Reconcile/verify targets post non-secret evidence notes to the State Hub
(`STATE_HUB_EVIDENCE_WORKSTREAM_ID` / `STATE_HUB_EVIDENCE_TASK_ID` env vars
attach them to a workstream/task). Never record Secret values — key counts
and readiness states only.
For `make admin-sync-smoke`, set `ACTIVITY_CORE_ADMIN_SYNC_FIXTURE_COMMAND`
when you need a specific enabled-flip/rename fixture before the sync call. The
command records whether a fixture ran; leaving it unset proves endpoint and
no-restart behavior only.

View file

@ -0,0 +1,155 @@
#!/usr/bin/env bash
# Prove POST /admin/sync works without restarting the activity-core worker.
set -euo pipefail
NAMESPACE="${ACTIVITY_CORE_NAMESPACE:-activity-core}"
CLUSTER_HOST="${ACTIVITY_CORE_CLUSTER_HOST:-railiance01}"
STATE_HUB_URL="${STATE_HUB_URL:-http://127.0.0.1:8000}"
ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL="${ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL:-0}"
ACTIVITY_CORE_ADMIN_SYNC_FIXTURE_COMMAND="${ACTIVITY_CORE_ADMIN_SYNC_FIXTURE_COMMAND:-}"
ACTIVITY_CORE_ADMIN_SYNC_REQUIRE_FIXTURE="${ACTIVITY_CORE_ADMIN_SYNC_REQUIRE_FIXTURE:-0}"
EVIDENCE_WORKSTREAM_ID="${STATE_HUB_EVIDENCE_WORKSTREAM_ID:-2c9e8e96-ec6a-433c-9e6d-0efbcd18679e}"
EVIDENCE_TASK_ID="${STATE_HUB_EVIDENCE_TASK_ID:-60f3387d-3d14-42a9-b8a3-725a86468510}"
STARTED_AT="$(date -u +"%Y-%m-%dT%H:%M:%SZ")"
CURRENT_GATE=startup
BEFORE_JSON=""
AFTER_JSON=""
FIXTURE_STATUS=skipped
SYNC_RESPONSE_JSON=""
EVIDENCE_NOTE_JSON=""
export NAMESPACE CLUSTER_HOST STATE_HUB_URL EVIDENCE_WORKSTREAM_ID EVIDENCE_TASK_ID
export STARTED_AT BEFORE_JSON AFTER_JSON FIXTURE_STATUS SYNC_RESPONSE_JSON
log() { printf '[activity-core-admin-sync-smoke] %s\n' "$*"; }
quote() { printf '%q' "$1"; }
cluster_bash() { if [[ -n "$CLUSTER_HOST" ]]; then ssh "$CLUSTER_HOST" "bash -s" <<<"$1"; else bash -s <<<"$1"; fi; }
post_evidence() {
local status="$1" failing_gate="${2:-}"
export EVIDENCE_STATUS="$status" FAILING_GATE="$failing_gate"
python3 - <<'PY'
import json, os, sys, urllib.request
def env_json(name):
raw = os.environ.get(name, "")
if not raw:
return None
try:
return json.loads(raw)
except json.JSONDecodeError:
return {"raw": raw}
status = os.environ["EVIDENCE_STATUS"]
failing_gate = os.environ.get("FAILING_GATE") or None
detail = {
"producer": "railiance-cluster",
"verification": "activity-core no-restart admin sync smoke",
"status": status,
"failing_gate": failing_gate,
"cluster_host": os.environ.get("CLUSTER_HOST") or "local-kubectl",
"namespace": os.environ.get("NAMESPACE"),
"worker_before": env_json("BEFORE_JSON"),
"worker_after": env_json("AFTER_JSON"),
"fixture_status": os.environ.get("FIXTURE_STATUS"),
"sync_response": env_json("SYNC_RESPONSE_JSON"),
"started_at": os.environ.get("STARTED_AT"),
}
summary = (
"Railiance activity-core no-restart admin-sync smoke passed: POST /admin/sync returned expected counters and worker pod identity/restart count stayed stable."
if status == "passed"
else "Railiance activity-core no-restart admin-sync smoke failed" + (f" at {failing_gate}" if failing_gate else "") + "; see non-secret evidence detail."
)
payload = {"summary": summary, "event_type": "note", "author": "railiance-cluster", "detail": detail}
if os.environ.get("EVIDENCE_WORKSTREAM_ID"):
payload["workstream_id"] = os.environ["EVIDENCE_WORKSTREAM_ID"]
if os.environ.get("EVIDENCE_TASK_ID"):
payload["task_id"] = os.environ["EVIDENCE_TASK_ID"]
req = urllib.request.Request(os.environ["STATE_HUB_URL"].rstrip("/") + "/progress/", data=json.dumps(payload).encode(), headers={"Content-Type": "application/json"}, method="POST")
with urllib.request.urlopen(req, timeout=20) as resp:
sys.stdout.write(resp.read().decode())
PY
}
on_error() { local code=$?; trap - ERR; post_evidence failed "$CURRENT_GATE" >/dev/null || true; exit "$code"; }
trap on_error ERR
if [[ "$CLUSTER_HOST" == local ]]; then
[[ "$ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL" == 1 ]] || { echo 'ACTIVITY_CORE_CLUSTER_HOST=local requires ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL=1' >&2; exit 2; }
CLUSTER_HOST=""
fi
export CLUSTER_HOST
CURRENT_GATE='cluster executor preflight'
log "using cluster executor: ${CLUSTER_HOST:-local kubectl}"
cluster_bash 'set -euo pipefail; command -v kubectl >/dev/null; command -v python3 >/dev/null'
worker_snapshot_script='import json,sys
items=json.load(sys.stdin).get("items",[])
if not items: raise SystemExit("no actcore-worker pods found")
pod=sorted(items,key=lambda item:item["metadata"]["name"])[0]
container=pod["status"]["containerStatuses"][0]
print(json.dumps({"name":pod["metadata"]["name"],"uid":pod["metadata"]["uid"],"phase":pod["status"].get("phase"),"restart_count":container.get("restartCount",0),"image":container.get("image"),"image_id":container.get("imageID")}, sort_keys=True))'
CURRENT_GATE='worker baseline capture'
BEFORE_JSON="$(cluster_bash "kubectl -n $(quote "$NAMESPACE") get pod -l app.kubernetes.io/name=actcore-worker -o json | python3 -c $(quote "$worker_snapshot_script")")"
export BEFORE_JSON
CURRENT_GATE='admin sync fixture'
if [[ -n "$ACTIVITY_CORE_ADMIN_SYNC_FIXTURE_COMMAND" ]]; then
log 'running operator-supplied fixture command'
cluster_bash "$ACTIVITY_CORE_ADMIN_SYNC_FIXTURE_COMMAND"
FIXTURE_STATUS=ran
elif [[ "$ACTIVITY_CORE_ADMIN_SYNC_REQUIRE_FIXTURE" == 1 ]]; then
echo 'ACTIVITY_CORE_ADMIN_SYNC_REQUIRE_FIXTURE=1 but no fixture command was supplied' >&2
exit 2
else
FIXTURE_STATUS=skipped
fi
export FIXTURE_STATUS
CURRENT_GATE='POST /admin/sync'
log 'calling POST /admin/sync?definitions=true&schedules=true'
SYNC_RESPONSE_JSON="$(
cluster_bash "$(cat <<EOF
set -euo pipefail
kubectl -n $(quote "$NAMESPACE") exec -i deploy/actcore-api -- python - <<'PY'
import json, urllib.request
req = urllib.request.Request('http://localhost:8010/admin/sync?definitions=true&schedules=true', method='POST')
with urllib.request.urlopen(req, timeout=60) as resp:
payload = json.loads(resp.read().decode())
required = [('definitions','synced'),('schedules','upserted'),('schedules','paused'),('schedules','deleted_orphans'),('errors',None)]
for section, key in required:
if section not in payload:
raise SystemExit(f'missing sync response section {section!r}')
if key is not None and key not in payload[section]:
raise SystemExit(f'missing sync response key {section}.{key}')
if payload.get('errors'):
raise SystemExit('admin sync returned errors: ' + json.dumps(payload['errors']))
print(json.dumps(payload, sort_keys=True))
PY
EOF
)"
)"
export SYNC_RESPONSE_JSON
CURRENT_GATE='worker no-restart verification'
AFTER_JSON="$(cluster_bash "kubectl -n $(quote "$NAMESPACE") get pod -l app.kubernetes.io/name=actcore-worker -o json | python3 -c $(quote "$worker_snapshot_script")")"
python3 - <<'PY'
import json, os
before = json.loads(os.environ['BEFORE_JSON'])
after = json.loads(os.environ['AFTER_JSON'])
if before['uid'] != after['uid']:
raise SystemExit(f"worker pod changed uid: {before['uid']} -> {after['uid']}")
if before['restart_count'] != after['restart_count']:
raise SystemExit(f"worker restart count changed: {before['restart_count']} -> {after['restart_count']}")
PY
export AFTER_JSON
CURRENT_GATE='State Hub evidence note'
log 'posting non-secret evidence note to State Hub'
EVIDENCE_NOTE_JSON="$(post_evidence passed '')"
trap - ERR
log 'verification passed'
printf '%s\n' "$EVIDENCE_NOTE_JSON"

View file

@ -0,0 +1,263 @@
#!/usr/bin/env bash
# Deploy ACTIVITY-WP-0016 code/schema/runtime together and prove daily-triage output.
set -euo pipefail
NAMESPACE="${ACTIVITY_CORE_NAMESPACE:-activity-core}"
CLUSTER_HOST="${ACTIVITY_CORE_CLUSTER_HOST:-railiance01}"
STATE_HUB_URL="${STATE_HUB_URL:-http://127.0.0.1:8000}"
ACTIVITY_CORE_REPO="${ACTIVITY_CORE_REPO:-/home/worsch/activity-core}"
ACTIVITY_CORE_REMOTE_REPO="${ACTIVITY_CORE_REMOTE_REPO:-}"
ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL="${ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL:-0}"
ACTIVITY_CORE_SYNC_RUNTIME_BUNDLE="${ACTIVITY_CORE_SYNC_RUNTIME_BUNDLE:-auto}"
ACTIVITY_CORE_RESTART_DEPLOYMENTS="${ACTIVITY_CORE_RESTART_DEPLOYMENTS:-1}"
REQUIRED_ACTIVITY_CORE_REV="${REQUIRED_ACTIVITY_CORE_REV:-bf877b7}"
DAILY_TRIAGE_DEFINITION_SLUG="${DAILY_TRIAGE_DEFINITION_SLUG:-daily-statehub-wsjf-triage}"
STATE_HUB_PROGRESS_TIMEOUT_SECONDS="${STATE_HUB_PROGRESS_TIMEOUT_SECONDS:-240}"
STATE_HUB_PROGRESS_POLL_SECONDS="${STATE_HUB_PROGRESS_POLL_SECONDS:-5}"
EVIDENCE_WORKSTREAM_ID="${STATE_HUB_EVIDENCE_WORKSTREAM_ID:-7cbbe0d6-fea9-41c6-840c-46d0d8e8edde}"
EVIDENCE_TASK_ID="${STATE_HUB_EVIDENCE_TASK_ID:-8096621a-54ee-4be5-943e-5dc2da19ed28}"
STARTED_AT="$(date -u +"%Y-%m-%dT%H:%M:%SZ")"
CURRENT_GATE=startup
REMOTE_REVISION=""
CONTRACT_JSON=""
API_IMAGE=""
API_IMAGE_ID=""
WORKER_IMAGE=""
WORKER_IMAGE_ID=""
SYNC_STATUS_JSON=""
TRIGGER_JSON=""
DEFINITION_ID=""
TRIGGER_KEY=""
EXPECTED_RUN_ID=""
PROGRESS_JSON=""
export NAMESPACE CLUSTER_HOST STATE_HUB_URL ACTIVITY_CORE_REMOTE_REPO REQUIRED_ACTIVITY_CORE_REV
export DAILY_TRIAGE_DEFINITION_SLUG STARTED_AT EVIDENCE_WORKSTREAM_ID EVIDENCE_TASK_ID
export STATE_HUB_PROGRESS_TIMEOUT_SECONDS STATE_HUB_PROGRESS_POLL_SECONDS
export REMOTE_REVISION CONTRACT_JSON API_IMAGE API_IMAGE_ID WORKER_IMAGE WORKER_IMAGE_ID
export SYNC_STATUS_JSON TRIGGER_JSON DEFINITION_ID TRIGGER_KEY EXPECTED_RUN_ID PROGRESS_JSON
log() { printf '[activity-core-triage-robustness] %s\n' "$*"; }
quote() { printf '%q' "$1"; }
cluster_bash() { if [[ -n "$CLUSTER_HOST" ]]; then ssh "$CLUSTER_HOST" "bash -s" <<<"$1"; else bash -s <<<"$1"; fi; }
should_sync_runtime_bundle() {
case "$ACTIVITY_CORE_SYNC_RUNTIME_BUNDLE" in
1|true|yes) return 0 ;;
0|false|no) return 1 ;;
auto) [[ -n "$CLUSTER_HOST" && -d "$ACTIVITY_CORE_REPO/k8s/railiance" ]]; return ;;
*) printf 'invalid ACTIVITY_CORE_SYNC_RUNTIME_BUNDLE=%s\n' "$ACTIVITY_CORE_SYNC_RUNTIME_BUNDLE" >&2; exit 2 ;;
esac
}
post_evidence() {
local status="$1" failing_gate="${2:-}"
export EVIDENCE_STATUS="$status" FAILING_GATE="$failing_gate"
python3 - <<'PY'
import json, os, sys, urllib.request
def env_json(name):
raw = os.environ.get(name, "")
if not raw:
return None
try:
return json.loads(raw)
except json.JSONDecodeError:
return {"raw": raw}
status = os.environ["EVIDENCE_STATUS"]
failing_gate = os.environ.get("FAILING_GATE") or None
detail = {
"producer": "railiance-cluster",
"verification": "activity-core WP-0016 coupled deploy and daily-triage smoke",
"status": status,
"failing_gate": failing_gate,
"cluster_host": os.environ.get("CLUSTER_HOST") or "local-kubectl",
"namespace": os.environ.get("NAMESPACE"),
"activity_core_repo": os.environ.get("ACTIVITY_CORE_REMOTE_REPO"),
"required_activity_core_revision": os.environ.get("REQUIRED_ACTIVITY_CORE_REV"),
"activity_core_revision": os.environ.get("REMOTE_REVISION") or None,
"runtime_bundle": "k8s/railiance/20-runtime.yaml",
"runtime_contract": env_json("CONTRACT_JSON"),
"sync_job": env_json("SYNC_STATUS_JSON"),
"api_image": os.environ.get("API_IMAGE") or None,
"api_image_id": os.environ.get("API_IMAGE_ID") or None,
"worker_image": os.environ.get("WORKER_IMAGE") or None,
"worker_image_id": os.environ.get("WORKER_IMAGE_ID") or None,
"definition_slug": os.environ.get("DAILY_TRIAGE_DEFINITION_SLUG"),
"definition_id": os.environ.get("DEFINITION_ID") or None,
"manual_trigger": env_json("TRIGGER_JSON"),
"expected_activity_core_run_id": os.environ.get("EXPECTED_RUN_ID") or None,
"state_hub_progress": env_json("PROGRESS_JSON"),
"started_at": os.environ.get("STARTED_AT"),
}
summary = (
"Railiance activity-core WP-0016 deploy/smoke passed: code/schema and bounded runtime contract were reconciled together, daily triage was triggered, and State Hub recorded schema-valid output."
if status == "passed"
else "Railiance activity-core WP-0016 deploy/smoke failed" + (f" at {failing_gate}" if failing_gate else "") + "; see non-secret evidence detail."
)
payload = {"summary": summary, "event_type": "note", "author": "railiance-cluster", "detail": detail}
if os.environ.get("EVIDENCE_WORKSTREAM_ID"):
payload["workstream_id"] = os.environ["EVIDENCE_WORKSTREAM_ID"]
if os.environ.get("EVIDENCE_TASK_ID"):
payload["task_id"] = os.environ["EVIDENCE_TASK_ID"]
req = urllib.request.Request(os.environ["STATE_HUB_URL"].rstrip("/") + "/progress/", data=json.dumps(payload).encode(), headers={"Content-Type": "application/json"}, method="POST")
with urllib.request.urlopen(req, timeout=20) as resp:
sys.stdout.write(resp.read().decode())
PY
}
on_error() { local code=$?; trap - ERR; post_evidence failed "$CURRENT_GATE" >/dev/null || true; exit "$code"; }
trap on_error ERR
if [[ "$CLUSTER_HOST" == local ]]; then
[[ "$ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL" == 1 ]] || { echo 'ACTIVITY_CORE_CLUSTER_HOST=local requires ACTIVITY_CORE_ALLOW_LOCAL_KUBECTL=1' >&2; exit 2; }
CLUSTER_HOST=""
fi
if [[ -z "$ACTIVITY_CORE_REMOTE_REPO" ]]; then
if [[ -n "$CLUSTER_HOST" ]]; then ACTIVITY_CORE_REMOTE_REPO="$(ssh "$CLUSTER_HOST" pwd)/activity-core"; else ACTIVITY_CORE_REMOTE_REPO="$ACTIVITY_CORE_REPO"; fi
fi
export CLUSTER_HOST ACTIVITY_CORE_REMOTE_REPO
CURRENT_GATE='cluster executor preflight'
log "using cluster executor: ${CLUSTER_HOST:-local kubectl}"
cluster_bash 'set -euo pipefail; command -v kubectl >/dev/null; command -v python3 >/dev/null'
CURRENT_GATE='runtime bundle sync'
if should_sync_runtime_bundle; then
log "syncing runtime bundle to ${CLUSTER_HOST}:${ACTIVITY_CORE_REMOTE_REPO}/k8s/railiance"
ssh "$CLUSTER_HOST" "mkdir -p $(quote "$ACTIVITY_CORE_REMOTE_REPO")/k8s/railiance"
rsync -a --delete "$ACTIVITY_CORE_REPO/k8s/railiance/" "${CLUSTER_HOST}:${ACTIVITY_CORE_REMOTE_REPO}/k8s/railiance/"
fi
CURRENT_GATE='activity-core revision gate'
REMOTE_REVISION="$(cluster_bash "set -euo pipefail; git -C $(quote "$ACTIVITY_CORE_REMOTE_REPO") rev-parse --short HEAD; git -C $(quote "$ACTIVITY_CORE_REMOTE_REPO") merge-base --is-ancestor $(quote "$REQUIRED_ACTIVITY_CORE_REV") HEAD")"
export REMOTE_REVISION
CURRENT_GATE='runtime contract gate'
CONTRACT_JSON="$(
cluster_bash "$(cat <<EOF
set -euo pipefail
python3 - $(quote "$ACTIVITY_CORE_REMOTE_REPO")/k8s/railiance/20-runtime.yaml <<'PY'
import json, re, sys
text = open(sys.argv[1], encoding='utf-8').read()
lower = text.lower()
max_tokens = [int(v) for v in re.findall(r"max_tokens\s*[:=]\s*['\"]?(\d+)", text)]
checks = {
'mentions_daily_instruction': 'daily-statehub-wsjf-triage' in lower,
'bounded_top_7': bool(re.search(r'(top[- ]?7|<=\s*7|≤\s*7|at most\s+7|no more than\s+7)', lower)),
'fewer_well_formed': 'fewer well-formed' in lower,
'ndjson_or_line_framing': 'ndjson' in lower or 'one recommendation json object per line' in lower,
'max_tokens_headroom': bool(max_tokens and max(max_tokens) >= 1800),
}
missing = [name for name, ok in checks.items() if not ok]
print(json.dumps({'path': sys.argv[1], 'max_tokens': max_tokens, 'checks': checks, 'missing': missing}, sort_keys=True))
if missing:
raise SystemExit('runtime bundle contract checks failed: ' + ', '.join(missing))
PY
EOF
)"
)"
export CONTRACT_JSON
CURRENT_GATE='runtime bundle reconcile'
log 'applying runtime bundle and restarting activity-core deployments'
cluster_bash "set -euo pipefail
kubectl apply -f $(quote "$ACTIVITY_CORE_REMOTE_REPO")/k8s/railiance/00-namespace.yaml
kubectl -n $(quote "$NAMESPACE") delete job actcore-migrate actcore-sync --ignore-not-found
kubectl apply -f $(quote "$ACTIVITY_CORE_REMOTE_REPO")/k8s/railiance/20-runtime.yaml
if [[ $(quote "$ACTIVITY_CORE_RESTART_DEPLOYMENTS") == 1 ]]; then kubectl -n $(quote "$NAMESPACE") rollout restart deploy/actcore-api deploy/actcore-worker deploy/actcore-event-router; fi
kubectl -n $(quote "$NAMESPACE") wait --for=condition=complete job/actcore-migrate --timeout=180s
kubectl -n $(quote "$NAMESPACE") rollout status deploy/actcore-api --timeout=180s
kubectl -n $(quote "$NAMESPACE") rollout status deploy/actcore-worker --timeout=180s
kubectl -n $(quote "$NAMESPACE") rollout status deploy/actcore-event-router --timeout=180s
kubectl -n $(quote "$NAMESPACE") wait --for=condition=complete job/actcore-sync --timeout=180s"
CURRENT_GATE='runtime status capture'
API_IMAGE="$(cluster_bash "kubectl -n $(quote "$NAMESPACE") get deploy actcore-api -o jsonpath='{.spec.template.spec.containers[0].image}'")"
API_IMAGE_ID="$(cluster_bash "kubectl -n $(quote "$NAMESPACE") get pod -l app.kubernetes.io/name=actcore-api -o jsonpath='{.items[0].status.containerStatuses[0].imageID}'")"
WORKER_IMAGE="$(cluster_bash "kubectl -n $(quote "$NAMESPACE") get deploy actcore-worker -o jsonpath='{.spec.template.spec.containers[0].image}'")"
WORKER_IMAGE_ID="$(cluster_bash "kubectl -n $(quote "$NAMESPACE") get pod -l app.kubernetes.io/name=actcore-worker -o jsonpath='{.items[0].status.containerStatuses[0].imageID}'")"
SYNC_STATUS_JSON="$(cluster_bash "kubectl -n $(quote "$NAMESPACE") get job actcore-sync -o json" | python3 -c 'import json,sys; j=json.load(sys.stdin); s=j.get("status",{}); print(json.dumps({"name":j["metadata"]["name"],"succeeded":s.get("succeeded",0),"failed":s.get("failed",0),"completion_time":s.get("completionTime")}))')"
export API_IMAGE API_IMAGE_ID WORKER_IMAGE WORKER_IMAGE_ID SYNC_STATUS_JSON
CURRENT_GATE='daily-triage manual trigger'
log "triggering ${DAILY_TRIAGE_DEFINITION_SLUG}"
TRIGGER_JSON="$(
cluster_bash "$(cat <<EOF
set -euo pipefail
kubectl -n $(quote "$NAMESPACE") exec -i deploy/actcore-api -- python - $(quote "$DAILY_TRIAGE_DEFINITION_SLUG") <<'PY'
import json, sys, urllib.request
slug = sys.argv[1]
with urllib.request.urlopen('http://localhost:8010/activity-definitions/', timeout=30) as resp:
definitions = json.load(resp)
match = None
for definition in definitions:
values = [str(definition.get(k) or '') for k in ('slug', 'name', 'id')]
if slug in values or any(slug in value for value in values):
match = definition
break
if not match:
raise SystemExit(f'definition matching {slug!r} not found')
definition_id = match['id']
req = urllib.request.Request(f'http://localhost:8010/activity-definitions/{definition_id}/trigger', method='POST')
with urllib.request.urlopen(req, timeout=30) as resp:
payload = json.loads(resp.read().decode())
payload['definition_id'] = definition_id
print(json.dumps(payload, sort_keys=True))
PY
EOF
)"
)"
DEFINITION_ID="$(python3 -c 'import json,os; print(json.loads(os.environ["TRIGGER_JSON"])["definition_id"])')"
TRIGGER_KEY="$(python3 -c 'import json,os; t=json.loads(os.environ["TRIGGER_JSON"]); print(t.get("trigger_key") or t.get("workflow_id") or "")')"
EXPECTED_RUN_ID="$(python3 - <<'PY'
import os, uuid
trigger_key = os.environ.get('TRIGGER_KEY')
definition_id = os.environ.get('DEFINITION_ID')
print(uuid.uuid5(uuid.NAMESPACE_URL, f'{definition_id}:{trigger_key}') if trigger_key else '')
PY
)"
export TRIGGER_JSON DEFINITION_ID TRIGGER_KEY EXPECTED_RUN_ID
CURRENT_GATE='State Hub daily_triage evidence'
log 'polling State Hub for schema-valid daily_triage progress'
PROGRESS_JSON="$(python3 - <<'PY'
from datetime import datetime
import json, os, time, urllib.parse, urllib.request
base = os.environ['STATE_HUB_URL'].rstrip('/')
started = datetime.fromisoformat(os.environ['STARTED_AT'].replace('Z', '+00:00'))
deadline = time.monotonic() + int(os.environ['STATE_HUB_PROGRESS_TIMEOUT_SECONDS'])
interval = int(os.environ['STATE_HUB_PROGRESS_POLL_SECONDS'])
expected_run_id = os.environ.get('EXPECTED_RUN_ID')
url = base + '/progress/?' + urllib.parse.urlencode({'event_type': 'daily_triage'})
while time.monotonic() < deadline:
with urllib.request.urlopen(url, timeout=20) as resp:
events = json.load(resp)
for event in events:
created_at = datetime.fromisoformat(event['created_at'].replace('Z', '+00:00'))
if created_at < started:
continue
detail = event.get('detail') or {}
if expected_run_id and isinstance(detail, dict):
run_id = detail.get('activity_core_run_id') or detail.get('run_id')
if run_id and run_id != expected_run_id:
continue
if not isinstance(detail, dict) or detail.get('output_validated') is not True:
continue
if detail.get('partial') is True and int(detail.get('quarantined_count') or 0) <= 0:
continue
print(json.dumps({'id': event['id'], 'event_type': event.get('event_type'), 'summary': event.get('summary'), 'author': event.get('author'), 'created_at': event.get('created_at'), 'output_validated': detail.get('output_validated'), 'partial': detail.get('partial'), 'quarantined_count': detail.get('quarantined_count'), 'activity_core_run_id': detail.get('activity_core_run_id'), 'detail_keys': sorted(detail.keys())}))
raise SystemExit(0)
time.sleep(interval)
raise SystemExit('no schema-valid daily_triage progress found')
PY
)"
export PROGRESS_JSON
CURRENT_GATE='State Hub evidence note'
log 'posting non-secret evidence note to State Hub'
post_evidence passed ''
trap - ERR
log 'verification passed'

View file

@ -4,11 +4,12 @@ type: workplan
title: "activity-core WP-0016 triage-output robustness deploy"
domain: financials
repo: railiance-cluster
status: ready
status: finished
owner: railiance-cluster
topic_slug: railiance
created: "2026-07-01"
updated: "2026-07-01"
updated: "2026-07-02"
state_hub_workstream_id: "7cbbe0d6-fea9-41c6-840c-46d0d8e8edde"
---
# activity-core WP-0016 triage-output robustness deploy
@ -31,20 +32,41 @@ whole-doc validator. It MUST ship together with the new `executor.py`
```task
id: RAIL-BS-WP-0008-T01
status: todo
status: done
priority: high
state_hub_task_id: "079e39a9-f938-4d03-a5bc-4d3d2f7b1d83"
```
Rebuild/import the activity-core image from main (`bf877b7` or later) into
the railiance01 k3s runtime and reconcile the activity-core deployment so the
new executor and the strict per-item schema ship together.
2026-07-02: Added `make deploy-activity-core-triage-robustness` /
`bin/railiance deploy-triage-robustness` as the repeatable operator path. The
command gates the remote activity-core repo on `bf877b7` or later, checks the
runtime bundle contract before applying it, restarts the activity-core
deployments by default, waits for migrate/sync jobs and rollouts, then records
non-secret State Hub evidence. Live execution on railiance01 remains pending.
2026-07-02 (later session): rebuilt `activity-core:railiance01-prod` locally
from activity-core main `7612112` (includes `bf877b7` and the T02 prompt
contract). Transfer/import to railiance01 was **blocked by the agent
permission policy** (production remote write requires explicit operator
authorization). Two preconditions found and fixed/noted: (a) the remote
`~/activity-core` copy has no `.git`, so the script's revision gate will fail
until the repo is synced with git metadata or `REQUIRED_ACTIVITY_CORE_REV`
verification is adapted; (b) the T02 runtime contract is now satisfied in the
repo bundle (activity-core commit `7612112`). Operator pickup: run the
image save/scp/import from the deploy README, sync the repo with `.git`, then
`make deploy-activity-core-triage-robustness`.
## Update daily-statehub-wsjf-triage runtime-bundle Instruction
```task
id: RAIL-BS-WP-0008-T02
status: todo
status: done
priority: high
state_hub_task_id: "129fb472-41e8-4e5c-bcbb-0995a96e223b"
```
In the runtime projection (not the activity-core repo), update the
@ -58,12 +80,18 @@ In the runtime projection (not the activity-core repo), update the
recommendation JSON object per line) so the T03 parser recovers items
independently.
2026-07-02: The new deploy command enforces this contract against
`k8s/railiance/20-runtime.yaml` before it will touch the cluster: it requires
the daily instruction, a top-7 bound, the "fewer well-formed" fallback, NDJSON
or one-object-per-line framing, and `max_tokens` headroom of at least 1800.
## Pull raw llm-connect response for the 2026-06-26 run
```task
id: RAIL-BS-WP-0008-T03
status: todo
status: cancel
priority: medium
state_hub_task_id: "59559f1d-821f-4660-8a7d-c623c6631864"
```
From the llm-connect pod logs / response store on railiance01, capture the
@ -76,8 +104,9 @@ secrets.
```task
id: RAIL-BS-WP-0008-T04
status: todo
status: done
priority: high
state_hub_task_id: "8096621a-54ee-4be5-943e-5dc2da19ed28"
```
Trigger one daily-triage run against the reconciled runtime and confirm it
@ -87,3 +116,36 @@ either (i) returns a clean schema-valid report, or (ii) degrades gracefully
shows a matching `daily_triage` progress event. Closes ACTIVITY-WP-0016-T05
and unblocks the three-clean-run streak for ACTIVITY-WP-0010-T04 /
WP-0006-T03.
2026-07-02: The deploy command now triggers the daily-triage definition after
reconcile and polls State Hub for a post-trigger `daily_triage` event with
`output_validated=true`. If the run is partial, it also requires
`quarantined_count>0` before posting pass evidence.
## Completion 2026-07-02
Deployed live with operator authorization. Image `activity-core:railiance01-prod`
rebuilt from main `7612112`, imported into railiance01 k3s
(`sha256:550c5592...`), repo synced with git metadata, and
`make deploy-activity-core-triage-robustness` applied the coupled
schema/executor bundle with all rollouts and migrate/sync jobs green.
- T01/T02 done: revision gate and runtime contract gate both passed
(`bounded_top_7`, `ndjson_or_line_framing`, `fewer_well_formed`,
`max_tokens_headroom` >= 1800 all true).
- T04 done: manually triggered daily-triage run produced a clean schema-valid
report — State Hub event `24d2d321-c761-47f7-bf9e-7950a6253c21`
(2026-07-02T09:50:44Z) with `output_validated=true`, exactly 7 ranked
recommendations, `working_memory_status=written`, no validation error. The
bounded top-7 contract is proven live; the three-clean-run streak for
ACTIVITY-WP-0010-T04 / WP-0006-T03 restarts from this run.
- T03 cancelled: the raw 2026-06-26 llm-connect response is unrecoverable —
the llm-connect pod is stateless (no volumes, no response store) and its
log stream contains only 2 startup lines from 2026-06-19. Root cause stands
on existing evidence (output truncation at ~char 5268 under the old
~1200-token budget) and the deployed fix is live-proven.
- Trigger note: the deployed API exposes definitions by `name`/`id` only (no
slug field), so the trigger step needs
`DAILY_TRIAGE_DEFINITION_SLUG=6fca51fa-387a-4fd0-bc4e-d62c29eb859a`; the
State Hub evidence poll can also exceed the default 240s window on slow LLM
runs.

View file

@ -4,11 +4,11 @@ type: workplan
title: "activity-core no-restart admin-sync smoke (ACTIVITY-WP-0012-T05)"
domain: financials
repo: railiance-cluster
status: active
status: finished
owner: railiance-cluster
topic_slug: railiance
created: "2026-07-01"
updated: "2026-07-01"
updated: "2026-07-02"
state_hub_workstream_id: "2c9e8e96-ec6a-433c-9e6d-0efbcd18679e"
---
@ -30,7 +30,7 @@ The deploy precondition is covered by RAIL-BS-WP-0008-T01 (main at
```task
id: RAIL-BS-WP-0009-T01
status: wait
status: done
priority: medium
state_hub_task_id: "60f3387d-3d14-42a9-b8a3-725a86468510"
```
@ -46,3 +46,25 @@ After RAIL-BS-WP-0008-T01 is deployed, without restarting the worker:
5. Record non-secret evidence in the State Hub. Response JSON should include
`definitions.synced`, `schedules.upserted`, `schedules.paused`,
`schedules.deleted_orphans`, and `errors[]`.
2026-07-02: Added `make admin-sync-smoke` / `bin/railiance admin-sync-smoke`
as the repeatable operator path. It captures the worker pod UID/restart count,
optionally runs an operator-supplied enabled-flip/rename fixture via
`ACTIVITY_CORE_ADMIN_SYNC_FIXTURE_COMMAND`, calls
`POST /admin/sync?definitions=true&schedules=true`, verifies the expected
response counters and empty `errors[]`, rechecks that the same worker pod did
not restart, and posts non-secret State Hub evidence. T01 stays `wait` until
RAIL-BS-WP-0008-T01 is deployed and the smoke is run on railiance01.
## Completion 2026-07-02
`make admin-sync-smoke` passed against the freshly deployed
RAIL-BS-WP-0008 runtime: `POST /admin/sync?definitions=true&schedules=true`
returned `ok=true` with `definitions.synced=6`, `schedules.upserted=4`,
`schedules.paused=2`, `deleted_orphans=0`, empty `errors[]`, and the worker
pod identity (`actcore-worker-5b78f85b76-ng54t`, restart_count 0) was
unchanged before and after — proving no-restart admin sync. Non-secret
evidence: State Hub event `4caa288d-830b-4348-9cff-b2d5855cd42d`. The
optional enabled-flip fixture was skipped (no operator fixture supplied);
schedule pause/upsert semantics were exercised by the sync counters. Closes
ACTIVITY-WP-0012-T05.