railiance-cluster/workplans/RAILIANCE-WP-0014-activity-core-llm-connect-live-reconcile.md
tegwick a2f1c1299c Finish RAILIANCE-WP-0014 activity-core llm-connect live reconcile
Provider Secret gate cleared; full reconcile passed with fixture smoke
(health=ok, latency 2.084s). Harden the smoke against NetworkPolicy
allowlist propagation by retrying up to 6x with a 5s warm-up inside the
smoke pod — the netpol added 2026-06-19 rejected the pod's immediate
first request before its IP propagated.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-01 23:52:05 +02:00

3.7 KiB
Raw Blame History

id type title domain repo status owner topic_slug created updated state_hub_workstream_id
RAILIANCE-WP-0014 workplan activity-core llm-connect live reconcile financials railiance-cluster finished codex railiance 2026-06-18 2026-07-01 a152ddda-d60a-4a65-9b9c-59e2db9ff2b7

activity-core llm-connect live reconcile

Context

activity-core has updated its Railiance runtime manifest so actcore-runtime-config points at the verified in-cluster llm-connect URL:

LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080
LLM_CONNECT_TIMEOUT_SECONDS=300

The remaining live gate belongs at the cluster/operator layer. Provider credentials must stay outside Git and State Hub, and the fixture smoke should record only non-secret evidence.

Add cluster-owned reconcile/check command

id: RAILIANCE-WP-0014-T01
status: done
priority: high
state_hub_task_id: "49288db7-8102-4ad5-af08-1fe6ab3f1d37"

Add a repeatable Railiance command that:

  • reconciles the non-secret activity-core runtime config keys;
  • checks the provider Secret by key count only;
  • applies the llm-connect overlay only after the provider Secret exists;
  • runs the in-namespace fixture smoke only after deployment readiness;
  • posts a non-secret State Hub evidence note.

2026-06-18: Added tools/cmd/railiance-reconcile-activity-core-llm-connect and Makefile target reconcile-activity-core-llm-connect.

Reconcile live non-secret runtime config

id: RAILIANCE-WP-0014-T02
status: done
priority: high
state_hub_task_id: "61df5bad-535f-4ad1-ac7a-f46ff278c388"

Patch the live activity-core/actcore-runtime-config ConfigMap so it consumes the verified llm-connect service URL and timeout. Do not touch Secret values.

2026-06-18: The reconcile command patches only LLM_CONNECT_URL and LLM_CONNECT_TIMEOUT_SECONDS, then re-reads the live ConfigMap to verify the values. Live evidence note c72c514a-399e-4c54-8d5b-d36405932360 confirms LLM_CONNECT_URL=http://llm-connect.activity-core.svc.cluster.local:8080 and LLM_CONNECT_TIMEOUT_SECONDS=300.

Complete provider Secret, deployment, and smoke gate

id: RAILIANCE-WP-0014-T03
status: done
priority: high
state_hub_task_id: "ae8af00a-c14f-4b76-933c-46d06cd360ae"

After an operator stores provider credentials in activity-core/llm-connect-provider-secrets, rerun:

make reconcile-activity-core-llm-connect

The command will apply the llm-connect overlay, wait for deployment readiness, run the in-namespace fixture smoke with imagePullPolicy=Never, and post non-secret evidence: provider Secret key count, deployment readiness, pass/fail, latency/recommendation summary or sanitized failure.

2026-07-01: Gate closed. Provider Secret activity-core/llm-connect-provider-secrets present (key count 1, no values inspected), overlay applied (no drift), deployment llm-connect ready 1/1, in-namespace fixture smoke passed (health=ok latency_seconds=2.084 recommendations=1). Evidence note bddbf5d2-6cbe-4d97-9de6-689147d61be1. The first rerun failed with Connection refused because the llm-connect-activity-core-only NetworkPolicy (added 2026-06-19) allowlist had not yet propagated the fresh smoke-pod IP; the reconcile tool now retries the smoke up to 6× with a 5s warm-up inside the pod.

Historical live gate on 2026-06-18: provider Secret activity-core/llm-connect-provider-secrets is missing, so deployment and smoke are intentionally blocked until operator/OpenBao-to-Kubernetes Secret custody is complete. Evidence note c72c514a-399e-4c54-8d5b-d36405932360 records provider Secret status missing, key count 0, deployment status not checked; provider Secret gate not satisfied, and smoke status blocked.