railiance-cluster/workplans/RAIL-BS-WP-0006-staged-promotion-lifecycle.md

9.1 KiB

id type title domain repo status owner topic_slug repo_goal_id state_hub_workstream_id created updated
RAIL-BS-WP-0006 workplan Staged Promotion Lifecycle financials railiance-cluster finished railiance railiance 6ea441f7-7fe3-4598-922b-38baf20c0580 cb72d3ba-1863-43c2-a2a5-49ac75fc2603 2026-02-24 2026-06-27

Staged Promotion Lifecycle

Goal

Design and implement the three-stage deployment lifecycle as the core Railiance application promotion pattern:

  1. Stage 1: local development and validation.
  2. Stage 2: canary on production infrastructure.
  3. Stage 3: full production promotion with rollback.

This lifecycle should become the repeatable path for native Railiance apps and third-party upstream applications wrapped by a Railiance overlay repo.

Why This Belongs Before Forgejo

Forgejo will become critical production infrastructure. Before moving the source forge itself, Railiance needs a well-defined promotion lifecycle so the Forgejo deployment, Actions runners, package registry, and future upgrades can move through the same staged gates as every other important workload.

Boundary

This workplan lives in railiance-cluster because it defines cluster runtime promotion mechanics and the canonical handoff between local validation, canary deployment, and production routing.

Expected cross-repo handoffs:

  • railiance-enablement: developer-facing CLI templates and CI workflow conventions.
  • railiance-platform: shared platform dependencies used by canaries.
  • railiance-apps: application Helm values and workload-specific promotion definitions.

Tasks

T01 - Write deployment lifecycle specification

id: RAIL-BS-WP-0006-T01
status: done
priority: high
state_hub_task_id: "fbfc341f-8ccb-4950-a85d-3e59c4f5b87f"

Write docs/deployment-lifecycle.md.

The spec should define:

  • Stage 1, Stage 2, and Stage 3 semantics.
  • Required checks before each stage.
  • Canary acceptance gates.
  • Rollback expectations.
  • Human approval gates for production-critical workloads.

Done when: the lifecycle is clear enough to apply to Forgejo as a later production workload.

2026-06-16: Added docs/deployment-lifecycle.md and linked it from docs/README.md. The specification defines Stage 1 local validation, Stage 2 production canary, Stage 3 production promotion, required checks and evidence, canary acceptance gates, rollback expectations, human approval gates for production-critical workloads, and the Forgejo readiness questions that must be answered before cutover.


T02 - Define railiance directory schema and app.toml contract

id: RAIL-BS-WP-0006-T02
status: done
priority: high
state_hub_task_id: "523cf928-bb0e-4109-a172-abf029c62885"

Define the repository-local railiance/ directory schema and app.toml contract for native and third-party applications.

Minimum contract:

  • App identity and ownership.
  • Stage definitions.
  • Required platform dependencies.
  • Health checks and observability endpoints.
  • Promotion and rollback commands.
  • Secret references without plaintext secret values.

Done when: a repo can declare how it moves through the Railiance promotion lifecycle without bespoke instructions.

2026-06-27: Added docs/app-toml-contract.md, schemas/railiance-app.schema.json, and examples/railiance/app.toml. The v1 contract covers app identity, ownership, source/artifact policy, platform dependencies, secret references without plaintext values, health and observability endpoints, stage commands/checks/evidence, canary and promotion modes, rollback strategy, and human approval gates.


T03 - Overlay repo pattern and creation script

id: RAIL-BS-WP-0006-T03
status: done
priority: medium
state_hub_task_id: "7cd378f2-0319-407a-9ce7-2c6d1a6d6d24"

Design the overlay repo pattern for third-party upstream applications and add create_railiance_overlay_repo.sh or equivalent tooling.

The pattern should keep upstream code and Railiance deployment concerns cleanly separated while still allowing reproducible promotion.

Done when: a third-party app can be wrapped without forking deployment logic into the upstream repository.

2026-06-27: Added docs/overlay-repo-pattern.md and tools/create_railiance_overlay_repo.sh, plus the bin/railiance create-overlay dispatcher entry. The scaffold records upstream identity in railiance/upstream.toml, generates a schema-valid railiance/app.toml, stage values, a thin Helm chart, Stage 1 test script, rollback runbook, and promotion notes without vendoring upstream code or touching secrets.


T04 - railiance run command

id: RAIL-BS-WP-0006-T04
status: done
priority: high
state_hub_task_id: "95c3311b-04bb-4c83-bda3-47958217b665"

Implement the Stage 1 railiance run command for local development and validation.

Expected behavior:

  • Read railiance/app.toml.
  • Start or validate the local development target.
  • Run defined local health checks.
  • Emit a machine-readable result suitable for later promotion gates.

Done when: at least one representative app can complete Stage 1 locally.

2026-06-27: Added tools/cmd/railiance-run, the bin/railiance run dispatcher entry, and docs/railiance-run-command.md. The command reads railiance/app.toml, runs Stage 1 commands and local checks, and emits railiance.run-result.v1 JSON without command logs or secret values. Updated the overlay generator so a generated Forgejo overlay completes Stage 1 locally in this environment; Helm rendering is optional when Helm is unavailable.


T05 - Canary Helm chart template

id: RAIL-BS-WP-0006-T05
status: done
priority: high
state_hub_task_id: "47b8cd47-99c7-4f31-a147-ea16afde7217"

Create the Stage 2 canary Helm chart template.

Minimum requirements:

  • Stable and canary release identities.
  • Weighted routing or equivalent traffic split through the chosen ingress path.
  • Prometheus-compatible annotations.
  • Resource limits appropriate for single-node and future ThreePhoenix use.
  • Rollback-safe values layout.

Done when: a canary deployment can be created without hand-editing cluster resources.

2026-06-27: Updated generated overlay charts for Stage 2 canaries. The scaffold now emits stable/canary release identities, isolated canary ingress by default, optional Traefik weighted routing, Prometheus-compatible annotations, HTTP probes, conservative single-node resource limits, rollback labels, separate Stage 2/Stage 3 values, and tests/stage2-template.sh. Verified a fresh Forgejo overlay with schema validation, Stage 1 run, and Stage 2 scaffold checks; Helm rendering was skipped because Helm is unavailable in this environment.


T06 - railiance deploy --stage 2 and observation tooling

id: RAIL-BS-WP-0006-T06
status: done
priority: medium
state_hub_task_id: "6a5c7422-fcb1-49d1-8153-e891bd1c27fa"

Implement Stage 2 deployment and observation commands.

Expected behavior:

  • Deploy the canary from declared app metadata.
  • Show rollout state, pod health, ingress/routing state, and key metrics.
  • Fail closed when prerequisites or health gates are missing.

Done when: Stage 2 can be run and observed from a repeatable command path.

2026-06-27: Added tools/cmd/railiance-stage2 and dispatcher entries for bin/railiance deploy and bin/railiance observe. Deploy emits a railiance.stage2-deploy-result.v1 plan by default, can run Helm server dry-run or apply when tools and cluster access are present, and fails closed when required paths, Helm, or approval evidence are missing. Observe emits a railiance.stage2-observe-result.v1 target plan by default and runs live kubectl rollout, pod, ingress, and metrics checks only with --live. Updated generated overlays to declare the repeatable Stage 2 plan commands.


T07 - railiance promote, rollback, and onboarding guide

id: RAIL-BS-WP-0006-T07
status: done
priority: medium
state_hub_task_id: "476198f6-0049-4ac4-9593-6723c86c9602"

Implement Stage 3 promotion and rollback commands, then write the reference onboarding guide.

Expected output:

  • railiance promote for controlled production promotion.
  • railiance rollback for reverting to the previous stable version.
  • A guide showing how a representative app adopts the lifecycle.
  • Explicit human approval points for critical infrastructure workloads.

Done when: a representative app can move Stage 1 -> Stage 2 -> Stage 3 and back through rollback using documented commands.

2026-06-27: Added tools/cmd/railiance-stage3 and dispatcher entries for bin/railiance promote and bin/railiance rollback. Both commands default to non-mutating JSON plans, apply modes require approval evidence and Helm, and rollback apply also requires a Helm revision for helm-revision strategy. Added docs/promote-rollback-onboarding.md with the representative Stage 1 -> Stage 2 -> Stage 3 -> rollback path and explicit human approval points for critical workloads. Updated generated overlays to declare promote/rollback plan commands.

Dependencies

This workplan should be done before the Forgejo production cutover. It can run in parallel with preparatory ThreePhoenix design, but its Stage 2/3 behavior should be validated against the intended ThreePhoenix cluster model.