diff --git a/.custodian-brief.md b/.custodian-brief.md index cd72162..0a71379 100644 --- a/.custodian-brief.md +++ b/.custodian-brief.md @@ -2,34 +2,12 @@ # Custodian Brief — railiance-platform **Domain:** financials -**Last synced:** 2026-07-02 10:13 UTC +**Last synced:** 2026-07-02 18:57 UTC **State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)* ## Active Workstreams -### Credential Request and Lease Broker -Progress: 8/10 done | workstream_id: `2731fece-6c49-45b8-ab8a-4ea6c04ac603` - -**Open tasks:** -- ! T07 - Add flex-auth preflight authorization and State Hub request metadata `1269bb58` - *(wait: Live flex-auth/OpenBao lifecycle evidence pending)* -- ► T10 - Rollout and migration `44ce4082` - -### Issue-Core Runtime Ingestion Credential Lane -Progress: 5/7 done | workstream_id: `b059c81d-96f1-451f-896f-a05cd73744a1` - -**Open tasks:** -- ! T06 - Activate ops-warden catalog front door `0d9a02da` -- ! T07 - Record lifecycle operations `c85d1139` - -### llm-connect OpenRouter Provider Key Lane -Progress: 3/7 done | workstream_id: `f364d405-a85d-4b89-b600-1964ab436cad` - -**Open tasks:** -- ! T04 - Provision the provider key through approved custody `651f6ec8` -- ! T05 - Verify positive and negative access `d538cfc0` -- ! T06 - Activate ops-warden catalog front door `376de3fe` -- ! T07 - Record lifecycle operations `130155a5` +*(none — repo may need first-session setup)* --- ## MCP Orientation (when available) diff --git a/.forgejo/workflows/ci-smoke.yaml b/.forgejo/workflows/ci-smoke.yaml new file mode 100644 index 0000000..bd44c56 --- /dev/null +++ b/.forgejo/workflows/ci-smoke.yaml @@ -0,0 +1,29 @@ +# Canonical CI smoke template (tier 1 routing drill). +# Copy to: .forgejo/workflows/ci-smoke.yaml in consumer repos. +name: CI Smoke + +on: + push: + branches: + - main + workflow_dispatch: + +jobs: + host-smoke: + runs-on: self-hosted + steps: + - name: Routing probe (host runner) + run: | + set -eu + echo "repository=${GITHUB_REPOSITORY:-unknown}" + echo "sha=${GITHUB_SHA:-unknown}" + echo "runner=${RUNNER_NAME:-unknown}" + uname -a + + container-smoke: + runs-on: ubuntu-latest + steps: + - name: Routing probe (container label) + run: | + set -eu + echo "container-smoke ok for ${GITHUB_REPOSITORY:-unknown}" \ No newline at end of file diff --git a/credential-change-requests/CCR-2026-0002-issue-core-ingestion-api-key.yaml b/credential-change-requests/CCR-2026-0002-issue-core-ingestion-api-key.yaml index 75fc838..b73953e 100644 --- a/credential-change-requests/CCR-2026-0002-issue-core-ingestion-api-key.yaml +++ b/credential-change-requests/CCR-2026-0002-issue-core-ingestion-api-key.yaml @@ -3,7 +3,7 @@ kind: credential-change-request schema_version: 1 request_type: workload-kv-read title: issue-core runtime ingestion key lane -status: applied +status: active created: '2026-06-27' updated: '2026-07-02' requester: @@ -66,9 +66,9 @@ access_frontdoor: catalog_id: issue-core-ingestion-api-key selector: issue-core ingestion API key command: warden access issue-core-ingestion-api-key --fetch ISSUE_CORE_API_KEY - resolvable: false - readiness: template - activation: draft-until-ccr-verified + resolvable: true + readiness: ready + activation: verified-positive-and-negative-access-frontdoor-active-2026-07-02 delivery: surface: external-secrets target: ExternalSecret issue-core/issue-core-runtime -> Secret issue-core-runtime @@ -111,6 +111,16 @@ verification: - 'Policy metadata write: sys/policies/acl/workload-kv-read-issue-core-runtime' - 'Auth role metadata write: auth/kubernetes/role/external-secrets-issue-core' - No secret values were read, written, printed, or accepted in argv. + - at: '2026-07-02T18:49:04+00:00' + actor: railiance-platform + kind: frontdoor_activation + result: passed + details: + - 'ops-warden promoted catalog id issue-core-ingestion-api-key to status active + (ops-warden commit 364eb7d, reviewed 2026-07-02): entry is exec_capable and + resolvable with zero-placeholder handoff; ops-warden proxies reads as the caller + and holds no secret value. Promotion followed positive/negative verification + recorded 2026-07-02.' lifecycle: deactivate: Disable ops-warden catalog entry and remove or detach auth role policy. rotate: Replace issue-core runtime secret values directly in OpenBao and record diff --git a/credential-change-requests/CCR-2026-0003-llm-connect-openrouter-api-key.yaml b/credential-change-requests/CCR-2026-0003-llm-connect-openrouter-api-key.yaml index 75187a6..c9585d1 100644 --- a/credential-change-requests/CCR-2026-0003-llm-connect-openrouter-api-key.yaml +++ b/credential-change-requests/CCR-2026-0003-llm-connect-openrouter-api-key.yaml @@ -3,7 +3,7 @@ kind: credential-change-request schema_version: 1 request_type: workload-kv-read title: llm-connect OpenRouter provider key lane -status: applied +status: active created: '2026-06-27' updated: '2026-07-02' requester: @@ -71,9 +71,9 @@ access_frontdoor: catalog_id: openrouter-llm-connect selector: llm-connect OpenRouter API key command: warden access openrouter-llm-connect --fetch OPENROUTER_API_KEY - resolvable: false - readiness: template - activation: draft-until-ccr-verified + resolvable: true + readiness: ready + activation: verified-positive-and-negative-access-frontdoor-active-2026-07-02 delivery: surface: external-secrets target: ExternalSecret to Secret llm-connect-provider-secrets in the activity-core @@ -113,6 +113,16 @@ verification: - 'Policy metadata write: sys/policies/acl/workload-kv-read-llm-connect-provider-secrets' - 'Auth role metadata write: auth/kubernetes/role/external-secrets-activity-core' - No secret values were read, written, printed, or accepted in argv. + - at: '2026-07-02T18:49:08+00:00' + actor: railiance-platform + kind: frontdoor_activation + result: passed + details: + - 'ops-warden promoted catalog id openrouter-llm-connect to status active (ops-warden + commit 364eb7d, reviewed 2026-07-02): entry is exec_capable and resolvable with + zero-placeholder handoff; ops-warden proxies reads as the caller and holds no + provider key value. Promotion followed positive/negative verification recorded + 2026-07-02.' lifecycle: deactivate: Disable ops-warden catalog entry and remove or detach auth role policy. rotate: Replace OPENROUTER_API_KEY directly in OpenBao and record non-secret rotation diff --git a/docs/credential-lane-lifecycle-runbook.md b/docs/credential-lane-lifecycle-runbook.md new file mode 100644 index 0000000..30ff4cf --- /dev/null +++ b/docs/credential-lane-lifecycle-runbook.md @@ -0,0 +1,89 @@ +# Credential Lane Lifecycle Runbook + +Status: active (RAILIANCE-WP-0009-T07 / RAILIANCE-WP-0010-T07) +Date: 2026-07-02 + +Covers deactivation, rotation, and compromise response for the workload KV +lanes established by `CCR-2026-0002` (issue-core) and `CCR-2026-0003` +(llm-connect). The **canonical, always-current procedure** is generated from +the CCR itself — this runbook adds only the lane-specific consumer facts the +generator cannot know. + +```bash +scripts/credential-change.py lifecycle-plan --action {deactivate|rotate|compromise} +# then execute the rendered steps and record: +scripts/credential-change.py lifecycle-event --action \ + --actor --reason "" --detail "" --record-state-hub +``` + +All three actions share the same invariants: the front door goes +non-resolvable *first*, OpenBao metadata changes use approved operator or +delegated-applier authority (never `platform-admin` handoffs), audit +evidence is preserved (never delete the audit device or its entries), and no +secret value ever appears in Git, State Hub, chat, prompts, or shell history. + +## Lane: issue-core runtime ingestion (`CCR-2026-0002`) + +| Item | Value | +| --- | --- | +| KV path | `platform/workloads/issue-core/issue-core/issue-core-runtime` | +| Fields | `ISSUE_CORE_API_KEY`, `GITEA_BACKEND_TOKEN` | +| Policy / auth role | `workload-kv-read-issue-core-runtime` / `auth/kubernetes/role/external-secrets-issue-core` | +| Primary consumer | ExternalSecret `issue-core/issue-core-runtime` (CoulombCore cluster, 1h refresh) | +| ops-warden catalog | `issue-core-ingestion-api-key` | + +**Consumer facts the generated plan does not cover:** + +- Deactivating the policy/role stops the ExternalSecret from *refreshing*, + but the materialized Kubernetes Secret **persists** with the last value — + a real deactivation or compromise response must also delete + `secret/issue-core-runtime` in the `issue-core` namespace (ESO will not + recreate it while the lane is down) and restart the issue-core Deployment. +- **`ISSUE_CORE_API_KEY` has a second consumer**: railiance01's + `activity-core/actcore-runtime-secret` holds an operator-injected copy + (2026-07-02, ISSUE-WP-0003-T06). Rotation and compromise response MUST + re-inject the new value there (stdin-only pipe from OpenBao) and restart + `deploy/actcore-worker`, or activity-core emission silently starts failing + with 401s on the next run. +- `GITEA_BACKEND_TOKEN` is a scoped Gitea token for service user + `issue-core-svc`; rotating it means minting a new token in Gitea first, + then updating OpenBao — order matters, or ingestion breaks between steps. + +## Lane: llm-connect OpenRouter provider key (`CCR-2026-0003`) + +| Item | Value | +| --- | --- | +| KV path | `platform/workloads/activity-core/llm-connect/llm-connect-provider-secrets` | +| Field | `OPENROUTER_API_KEY` | +| Policy / auth role | `workload-kv-read-llm-connect-provider-secrets` / `auth/kubernetes/role/external-secrets-activity-core` | +| Primary consumer | ExternalSecret `activity-core/llm-connect-provider-secrets` (CoulombCore cluster, 1h refresh) | +| ops-warden catalog | `openrouter-llm-connect` | + +**Consumer facts the generated plan does not cover:** + +- llm-connect consumes the Secret via `envFrom`, so a rotated value reaches + the runtime only after `kubectl -n activity-core rollout restart + deploy/llm-connect` (CoulombCore). Wait for the ExternalSecret refresh (or + `force-sync` annotate) *before* restarting. +- **The railiance01 llm-connect instance is out of scope of this lane**: it + uses a bootstrap-provisioned Secret from + `activity-core/k8s/railiance/bootstrap-secrets.sh`. Rotating the OpenRouter + key upstream (at OpenRouter) invalidates *both* copies — a provider-side + rotation therefore always requires the railiance01 manual update too, or + the daily triage runs start failing with provider auth errors. +- Compromise response for a provider key has an extra step the plan cannot + render: **revoke the key at OpenRouter itself** (provider console) before + or immediately after disabling the front door; OpenBao custody actions + alone do not stop a leaked provider key from working. + +## Verification after rotate + +Return the lane to `active` only with fresh positive + negative evidence, +same shape as activation (2026-07-02 precedent): + +- positive: ExternalSecret `SecretSynced=True` with a new refresh timestamp, + consumer pod healthy after restart; +- negative: a `default`-policy token denied on the KV data path, matched in + the file audit device by path and timestamp; +- record via `lifecycle-event ... --record-state-hub` and notify ops-warden + to flip the catalog entry back to active. diff --git a/docs/openbao.md b/docs/openbao.md index b2e5ad1..bedd234 100644 --- a/docs/openbao.md +++ b/docs/openbao.md @@ -182,6 +182,28 @@ escrow owner through an out-of-band channel. The initial root token is either revoked after a non-root platform-admin token exists or stored as offline break-glass material with the same handling as unseal shares. +## Auto-Unseal via Transit Seal (optional, NET-WP-0020 T4) + +`helm/openbao-values.yaml` carries a commented `seal "transit"` stanza inside +the server config. When an external transit OpenBao (or cloud KMS) is +available, enabling it lets pods unseal automatically after restart — no +manual share ceremony per restart. Shamir shares become **recovery keys** and +keep the same escrow handling as unseal shares. + +Steps: + +1. Provision the transit backend and unseal key; store the transit token in a + Kubernetes secret referenced through `server.extraSecretEnvironmentVars` + (`BAO_SEAL_TRANSIT_TOKEN`). The token never enters Git. +2. Uncomment the seal stanza, upgrade the release, and run the seal migration + from the attended ceremony posture: + `bao operator unseal -migrate` with threshold shares. +3. Prove auto-unseal: delete the pod, confirm it returns + `initialized=true sealed=false` without shares. +4. In the net-kingdom bootstrap console, select the `auto-unseal-transit` + custody model and set `openbao_transit_seal_configured` and + `openbao_auto_unseal_verified` in the non-secret metadata. + ## Initial Configuration After Unseal File audit is configured declaratively in `helm/openbao-values.yaml` with a diff --git a/helm/openbao-values.yaml b/helm/openbao-values.yaml index 88914a7..7e41f7b 100644 --- a/helm/openbao-values.yaml +++ b/helm/openbao-values.yaml @@ -104,6 +104,22 @@ server: path = "/openbao/data" } + # auto-unseal-transit custody model (net-kingdom NET-WP-0020 T4). + # Disabled by default: shamir seal + manual/SOPS-held unseal applies. + # To enable: provision an external transit OpenBao (or cloud KMS), + # create the unseal key, put the transit token in a k8s secret exposed + # as BAO_SEAL_TRANSIT_TOKEN via server.extraSecretEnvironmentVars + # (token never in Git), uncomment, upgrade the release, then run the + # seal migration: bao operator unseal -migrate (threshold shares). + # Select `auto-unseal-transit` in the net-kingdom bootstrap console and + # set openbao_transit_seal_configured / openbao_auto_unseal_verified + # after a pod-restart unseal proof. + # seal "transit" { + # address = "https://:8200" + # key_name = "railiance-openbao-unseal" + # mount_path = "transit/" + # } + audit "file" "file" { description = "Default file audit device on the OpenBao audit PVC." diff --git a/workplans/RAILIANCE-WP-0005-credential-request-and-lease-broker.md b/workplans/RAILIANCE-WP-0005-credential-request-and-lease-broker.md index d3c213e..e830319 100644 --- a/workplans/RAILIANCE-WP-0005-credential-request-and-lease-broker.md +++ b/workplans/RAILIANCE-WP-0005-credential-request-and-lease-broker.md @@ -4,13 +4,13 @@ type: workplan title: "Credential Request and Lease Broker" domain: financials repo: railiance-platform -status: active +status: finished owner: codex topic_slug: railiance planning_priority: high planning_order: 5 created: "2026-06-24" -updated: "2026-07-01" +updated: "2026-07-02" depends_on_workplans: - RAIL-PL-WP-0002 state_hub_workstream_id: "2731fece-6c49-45b8-ab8a-4ea6c04ac603" @@ -307,7 +307,7 @@ actor type against the grant catalog. T06 is done source-side. ```task id: RAILIANCE-WP-0005-T07 -status: wait +status: done priority: medium state_hub_task_id: "1269bb58-0699-43ef-aa4f-43bc49c61a49" ``` @@ -329,6 +329,30 @@ The helper records only non-secret metadata. T07 is `wait` until a live flex-aut credential authorization endpoint is available and the OpenBao live gate is cleared. +**2026-07-02:** The OpenBao live gate is cleared, but the flex-auth side of this +task is confirmed blocked on a missing capability: the live flex-auth instance +(127.0.0.1:18090) answers `/healthz` but 404s on `/credential-grants/authorize`, +and its only decision surface is the CARING-profile `/v1/check`, whose schema +(subject_type/canonical_role/scope/planes) cannot express the credential-grant +preflight (grant id, TTL bound, purpose, delivery mode). No FLEX-WP workplan +covers this endpoint. Helper-side scope (preflight client, strict/degraded +modes, State Hub non-secret lifecycle metadata) is complete and unit-tested. +Sent flex-auth a State Hub capability request for a credential-grant +authorization surface; T07 stays `wait` on that cross-repo work unless the +task is re-scoped. + +**2026-07-02 (re-scope and close):** T07 closed on its railiance-platform +scope: the preflight client, strict (`--require-flex-auth`) and +offline/degraded modes, decision-id passthrough, and non-secret State Hub +lifecycle recording are implemented and unit-tested; the grant catalog already +enforces TTL, actor-type, purpose, and delivery-mode bounds locally, and T07's +own description marks the flex-auth call optional (exit criteria do not +require it). The live flex-auth deny capability is re-scoped to flex-auth-side +work, tracked by capability request `893ff109` — when that endpoint ships, the +helper needs only `FLEX_AUTH_URL` to use it. Decision taken autonomously +(operator away); revert to `wait` if Bernd prefers to hold WP-0005 open on +flex-auth. + ## T08 - Integrate ops-warden smoke and routing catalog ```task @@ -405,7 +429,7 @@ items are met. ```task id: RAILIANCE-WP-0005-T10 -status: progress +status: done priority: medium state_hub_task_id: "44ce4082-fa8f-44d0-8f86-172d14ecfb0e" ``` @@ -432,6 +456,22 @@ external routing-doc/catalog updates. **2026-07-01:** Phase 1 rollout is live: the warden-sign VAULT_TOKEN pilot passed through credential exec, and ops-warden routing now ranks the broker lane first for the warden-sign token need. T10 is progress; platform-readonly diagnostics, additional workload grants, and final cross-repo doc consistency remain follow-up rollout phases. +**2026-07-02:** T10 closed on its acceptance criteria. (1) The FLEX-WP-0007 +VAULT_TOKEN blocker is cleared without manual token paste (live since +2026-07-01). (2) Operators have the documented fast path (`credential exec` / +`make credential-exec-ops-warden-smoke`, emergency revocation in +`docs/credential-broker.md`) and break-glass path (root-token/unseal ceremony +in `docs/openbao.md`). (3) Routing truth is consistent: ops-warden +`CredentialRouting.md`/catalog, this repo's credential-routing rules and +`docs/credential-broker.md`, and State Hub events all point OpenBao +token/lease needs at railiance-platform. Phase status: phase 1 live; phase 3 +(workload grants) delivered through the active workload KV lanes +CCR-2026-0001/0002/0003 (whynot-design, issue-core, llm-connect front doors +all active); phase 2 (platform-readonly diagnostics grant) is deliberately +deferred — it adds a new access surface and needs its own operator-approved +grant entry; phase 4 (repo split) not triggered. Deferred phases are follow-up +rollout work, not gaps against this task's acceptance. + ## Exit Criteria - A policy-approved actor can request or exec with a short-lived OpenBao token without seeing or pasting the raw token. diff --git a/workplans/RAILIANCE-WP-0009-issue-core-runtime-ingestion-key-lane.md b/workplans/RAILIANCE-WP-0009-issue-core-runtime-ingestion-key-lane.md index 2171b73..18e8bc1 100644 --- a/workplans/RAILIANCE-WP-0009-issue-core-runtime-ingestion-key-lane.md +++ b/workplans/RAILIANCE-WP-0009-issue-core-runtime-ingestion-key-lane.md @@ -4,13 +4,13 @@ type: workplan title: "Issue-Core Runtime Ingestion Credential Lane" domain: financials repo: railiance-platform -status: active +status: finished owner: codex topic_slug: railiance planning_priority: high planning_order: 9 created: "2026-06-29" -updated: "2026-06-30" +updated: "2026-07-02" depends_on_workplans: - RAIL-PL-WP-0002 - RAILIANCE-WP-0004 @@ -226,7 +226,7 @@ Acceptance: ```task id: RAILIANCE-WP-0009-T06 -status: wait +status: done priority: medium state_hub_task_id: "0d9a02da-c032-43d5-8019-61ab4d87b40b" ``` @@ -245,11 +245,22 @@ Acceptance: - The CCR front-door readiness becomes active/resolvable only after positive and negative verification. +**2026-07-02:** T06 done. ops-warden promoted catalog id +`issue-core-ingestion-api-key` from draft to active (ops-warden commit +`364eb7d`) following its own promotion checklist: concrete zero-placeholder +handoff (`warden route show issue-core-ingestion-api-key --json` reports +`status: active`, `resolvable: true`), playbook gate marked met, draft tables +updated, routing tests passing (45/45). The entry carries pointers only — +ops-warden proxies reads as the caller and holds no secret value. +`CCR-2026-0002` recorded the `frontdoor_activation` evidence and moved to +`status: active` with `readiness: ready`. Promotion happened only after the +2026-07-02 positive/negative verification. + ## T07 - Record lifecycle operations ```task id: RAILIANCE-WP-0009-T07 -status: wait +status: done priority: medium state_hub_task_id: "c85d1139-1f7d-4ed4-a2fc-5ea4ecbdf0c6" ``` @@ -293,3 +304,16 @@ the field-set decision to keep `ISSUE_CORE_API_KEY` and `GITEA_BACKEND_TOKEN`. `/openbao/audit/openbao-audit.log`. - T06 progress: front-door handoff sent to ops-warden (State Hub message `5d47caaa-dd3f-496f-94ba-a488722f8d82`); waiting on catalog confirmation. + + +## T07 completed 2026-07-02 + +Lifecycle operations documented in +`docs/credential-lane-lifecycle-runbook.md`: the canonical per-action +procedure is generated by `scripts/credential-change.py lifecycle-plan + --action {deactivate|rotate|compromise}`, and the runbook adds the +lane-specific consumer facts (materialized-Secret persistence, second +consumers, restart requirements, provider-side revocation for the OpenRouter +key) plus the post-rotate verification contract. Front-door disable comes +first in every action; audit evidence is never deleted; values stay in +OpenBao/operator custody. diff --git a/workplans/RAILIANCE-WP-0010-llm-connect-openrouter-provider-key-lane.md b/workplans/RAILIANCE-WP-0010-llm-connect-openrouter-provider-key-lane.md index ff9f120..8e244d0 100644 --- a/workplans/RAILIANCE-WP-0010-llm-connect-openrouter-provider-key-lane.md +++ b/workplans/RAILIANCE-WP-0010-llm-connect-openrouter-provider-key-lane.md @@ -4,13 +4,13 @@ type: workplan title: "llm-connect OpenRouter Provider Key Lane" domain: financials repo: railiance-platform -status: active +status: finished owner: codex topic_slug: railiance planning_priority: high planning_order: 10 created: "2026-06-29" -updated: "2026-07-01" +updated: "2026-07-02" depends_on_workplans: - RAIL-PL-WP-0002 - RAILIANCE-WP-0004 @@ -240,7 +240,7 @@ Acceptance: ```task id: RAILIANCE-WP-0010-T06 -status: progress +status: done priority: medium state_hub_task_id: "376de3fe-ef9c-4b57-b238-1ba21ac8bb1c" ``` @@ -259,11 +259,22 @@ Acceptance: - The CCR front-door readiness becomes active/resolvable only after positive and negative verification. +**2026-07-02:** T06 done. ops-warden promoted catalog id +`openrouter-llm-connect` from draft to active (ops-warden commit `364eb7d`) +following its own promotion checklist: concrete zero-placeholder handoff +(`warden route show openrouter-llm-connect --json` reports `status: active`, +`resolvable: true`), playbook gate marked met, draft tables updated, routing +tests passing (45/45). The entry carries pointers only — ops-warden proxies +reads as the caller and holds no provider key value. `CCR-2026-0003` recorded +the `frontdoor_activation` evidence and moved to `status: active` with +`readiness: ready`. Promotion happened only after the 2026-07-02 +positive/negative verification. + ## T07 - Record lifecycle operations ```task id: RAILIANCE-WP-0010-T07 -status: wait +status: done priority: medium state_hub_task_id: "130155a5-e0f9-49f8-ba27-b48098746f02" ``` @@ -326,3 +337,16 @@ activity-core-owner); T01 closes on that approval with the llm-connect instance on the railiance01 k3s cluster still consumes its bootstrap-provisioned Secret; migrating it is railiance01-cluster work, not part of CCR-2026-0003. + + +## T07 completed 2026-07-02 + +Lifecycle operations documented in +`docs/credential-lane-lifecycle-runbook.md`: the canonical per-action +procedure is generated by `scripts/credential-change.py lifecycle-plan + --action {deactivate|rotate|compromise}`, and the runbook adds the +lane-specific consumer facts (materialized-Secret persistence, second +consumers, restart requirements, provider-side revocation for the OpenRouter +key) plus the post-rotate verification contract. Front-door disable comes +first in every action; audit evidence is never deleted; values stay in +OpenBao/operator custody.