--- id: RAIL-BS-WP-0008 type: workplan title: "activity-core WP-0016 triage-output robustness deploy" domain: financials repo: railiance-cluster status: active owner: railiance-cluster topic_slug: railiance created: "2026-07-01" updated: "2026-07-02" state_hub_workstream_id: "7cbbe0d6-fea9-41c6-840c-46d0d8e8edde" --- # activity-core WP-0016 triage-output robustness deploy ## Context Inbox message `87952ff1` (activity-core, 2026-06-26): the scheduled daily WSJF triage run on 2026-06-26 failed schema validation and the whole run was discarded, resetting the WP-0006-T03 three-clean-run streak. ACTIVITY-WP-0016 hardened the instruction-executor output contract in-repo (commits `5eb33bd..bf877b7` on activity-core main, 220 tests passed). The remaining work is operator/cluster-owned on railiance01. **Deploy coupling constraint:** `schemas/daily-triage-report.json` is now strict per-item and is consumed by both the llm-connect hint and the whole-doc validator. It MUST ship together with the new `executor.py` (T03 per-item quarantine parser). Never deploy the schema ahead of the code. ## Deploy activity-core with coupled schema and executor ```task id: RAIL-BS-WP-0008-T01 status: todo priority: high state_hub_task_id: "079e39a9-f938-4d03-a5bc-4d3d2f7b1d83" ``` Rebuild/import the activity-core image from main (`bf877b7` or later) into the railiance01 k3s runtime and reconcile the activity-core deployment so the new executor and the strict per-item schema ship together. 2026-07-02: Added `make deploy-activity-core-triage-robustness` / `bin/railiance deploy-triage-robustness` as the repeatable operator path. The command gates the remote activity-core repo on `bf877b7` or later, checks the runtime bundle contract before applying it, restarts the activity-core deployments by default, waits for migrate/sync jobs and rollouts, then records non-secret State Hub evidence. Live execution on railiance01 remains pending. ## Update daily-statehub-wsjf-triage runtime-bundle Instruction ```task id: RAIL-BS-WP-0008-T02 status: todo priority: high state_hub_task_id: "129fb472-41e8-4e5c-bcbb-0995a96e223b" ``` In the runtime projection (not the activity-core repo), update the `daily-statehub-wsjf-triage` Instruction: - raise `max_tokens` (currently ~1200; give clear headroom above the ~1300–1500-token 16-workstream list); - prompt: bounded top-N (≤7) ranked recommendations, "if uncertain emit fewer well-formed items rather than more"; - prompt: per-item NDJSON framing (leading summary object, then one recommendation JSON object per line) so the T03 parser recovers items independently. 2026-07-02: The new deploy command enforces this contract against `k8s/railiance/20-runtime.yaml` before it will touch the cluster: it requires the daily instruction, a top-7 bound, the "fewer well-formed" fallback, NDJSON or one-object-per-line framing, and `max_tokens` headroom of at least 1800. ## Pull raw llm-connect response for the 2026-06-26 run ```task id: RAIL-BS-WP-0008-T03 status: todo priority: medium state_hub_task_id: "59559f1d-821f-4660-8a7d-c623c6631864" ``` From the llm-connect pod logs / response store on railiance01, capture the full raw response and `finish_reason` for the 2026-06-26 05:20:57Z run (activity-core retained only a 4000-char preview; the JSON break is at char 5268). Send to activity-core to close ACTIVITY-WP-0016-T01. Logs only, no secrets. ## Acceptance smoke ```task id: RAIL-BS-WP-0008-T04 status: todo priority: high state_hub_task_id: "8096621a-54ee-4be5-943e-5dc2da19ed28" ``` Trigger one daily-triage run against the reconciled runtime and confirm it either (i) returns a clean schema-valid report, or (ii) degrades gracefully (valid recommendations with `output_validated=true`, `partial=true`, `quarantined_count>0`) instead of discarding the run. Confirm the State Hub shows a matching `daily_triage` progress event. Closes ACTIVITY-WP-0016-T05 and unblocks the three-clean-run streak for ACTIVITY-WP-0010-T04 / WP-0006-T03. 2026-07-02: The deploy command now triggers the daily-triage definition after reconcile and polls State Hub for a post-trigger `daily_triage` event with `output_validated=true`. If the run is partial, it also requires `quarantined_count>0` before posting pass evidence.