Plan — Next-Generation Greentic Deployment
Current direction — one line per axis
greentic.deployer.k8s@1.0.0). New providers ship as new env-packs.BundleDeployment; mutable TrafficSplit per deployment_id. In-process RevisionDispatcher is authoritative.BundleDeployment carries customer_id + signed/versioned revenue_share policy for Service Agency reselling.env_id param surgically. No single greentic-wizard-engine in this plan.gtc op passthrough remains the entry point. gtc start is the local one-shot.dev migrates to local.GREENTIC-BIZ/greentic-billing.1. Context — why this restructure
The prior plan was structured around lifting AWS-specific code into deployer packs and shipping an AWS-stable E2E first. After source review and 2026-05-14 user direction, that framing still had four problems:
- It treated "Environment" as a string label, not a deploy target. Partial environment types exist today (
greentic-config-types::EnvironmentConfig,greentic-types::store::Environment), but neither is the deployment object this plan needs. Several repos still carry an opaque environment string defaulting to"dev". Telemetry exporters, secrets backends, route tables, and credential providers attach to the bundle or provider, not to the environment that the revision runs in. The new spec must reconcile these existing types instead of pretending the name is unused. - It treated deploying a new version and shifting traffic as the same act. Greentic today runs exactly one bundle per
(tenant, team); deploying v2 overwrites v1 in place. This is incompatible with Cloud Run's industry-standard pattern of Revision (immutable, content-addressed) and TrafficSplit (mutable, separate). Every modern platform that does production traffic separates these concerns; Greentic must too. - It conflated "what attaches to Env" with "what we deploy." Greentic has two artifacts with different lifecycles: environment-packs (secret manager, deployer, telemetry exporter, sessions backend, state backend) which link to an Environment, and application bundles (fast2flow, llm, RAG, customer.support) which we deploy onto that Environment. The prior framing obscured this axis.
- It coupled credentials and provider type to closed enums and to a single-bundle world. Customers' admins refuse to hand over full admin credentials, and the supported provider list must grow without recompiling core types. Both need first-class treatment, not enum sprawl. Separately, the platform's revenue model requires per-customer usage tracking with revenue share — that data has to be anchored on the deployed bundle.
deployment_id, not just (env_id, bundle_id). Two customers can deploy the same bundle into one Env and need different revision history, rollout windows, revenue-share policy, and rollback state. The router resolves a deployment_id from tenant/team/customer/host/path routing before choosing a revision.
Four business goals still bound the plan:
- Zain runs on K8s in production — that is the eventual target. AWS is the first cloud proving ground because the existing
greentic-deployer/src/aws.rs+ Terraform groundwork lets us validate Environment/Revision/TrafficSplit on real infrastructure sooner. K8s design, conformance tests, and manifest generation start in parallel. "Zain ready" = K8s provider shipped, not AWS provider shipped. - Monthly-hosted consulting needs repeatable deploys we can price.
- Ubuntu / Canonical wants first-class Snap + Juju deployers we can demo without recompiling the host.
- Bima reselling wants a Store that ships platform + bundle updates safely AND a per-customer revenue-share anchor (P6).
2. The six pillars
P1Environment is the deploy target — extensible by descriptor, with host / setup / runtime config
A new first-class type, persisted at ~/.greentic/environments/<env-id>/environment.json (operator-owned in production; CLI-owned for local). Provider/secrets/telemetry/sessions/state are no longer enums — they are references to attached environment-packs identified by namespaced descriptor.
schema: greentic.environment.v1
environment_id: local # default — created on first `gtc setup`
name: Local dev
host_config: # set by operator/admin at create/update time
region: null
cluster_endpoint: null
public_hostname: localhost
tenant_org_id: null
redis_url: null # null when not yet known; runtime fills it later
packs: # env-packs bound to this env, one per capability slot
- slot: deployer
kind: greentic.deployer.local-process@1.0.0
pack_ref: greentic.deployer.local-process
answers_ref: env-packs/deployer.answers.json
- slot: secrets
kind: greentic.secrets.dev-store@1.0.0
pack_ref: greentic.secrets.dev-store
- slot: telemetry
kind: greentic.telemetry.stdout@1.0.0
- slot: sessions
kind: greentic.sessions.in-memory@1.0.0
- slot: state
kind: greentic.state.in-memory@1.0.0
credentials_ref: env-packs/credentials.json # see P5
bundles: [] # application bundles deployed in this env
revisions: [] # immutable revision records per deployment
traffic_splits: [] # one TrafficSplit per deployment_id
revocation:
feed_url: null # opt-in; defaults off for `local`
retention:
archived_max_age_days: 14
staged_max_count: 5
health:
last_doctor_status: pending
Three sources of config, three owners
| Source | Owner | When set | Examples |
|---|---|---|---|
| Host config | Operator / admin via gtc op env create|update | Knowable at create time | Region, cluster endpoint, public hostname, tenant org, fixed Redis URL |
| Setup config | Wizard answers per env-pack | When attaching/updating an env-pack via gtc op env-packs add|update | Telegram bot token, OTLP sampling rate, SMTP From address |
| Runtime config | Deployer env-pack, written post-apply | Discovered after the deployer runs | ALB DNS, Cloud Run URL, K8s Service ClusterIP, generated secret ARNs |
deployer, secrets, telemetry, sessions, state, revocation. v1 binds exactly one env-pack per slot. Fan-out (e.g. telemetry to two exporters) is a composite env-pack in v1; list semantics may come later.
local is implicit; dev migrates only after a preflight gate
First-time gtc setup creates local with default env-packs. Phase A adds gtc op env migrate-dev local --check, which scans dev-store keys, env strings in configs, audit-event env labels, bundle hints, and known runtime defaults. If the check proves zero live/prod usage, the migration performs a one-shot rewrite and removes old dev reads. If not, the release carries a short dev → local compatibility alias that emits warning telemetry and expires after the next train.
LocalFsStore is the developer/default implementation. Any non-local production EnvironmentStore must provide compare-and-swap writes, idempotency replay, RBAC decisions, append-only audit, backups, restore drills, and operator HA before Phase D can claim production readiness. Plain fs::write or best-effort S3 writes are not production state.
P1bCLI surface — gtc op is the operator wizard, gtc start is the local shortcut
Two operator-facing top-level commands cover the lifecycle. gtc op already exists as a passthrough to greentic-operator; keep that routing model. Every gtc op sub-command honors --schema (print the QASpec it would run) and --answers <file> (non-interactive replay).
| Sub-command | What it does |
|---|---|
gtc op env create|update|list|show|doctor|destroy | Environment CRUD and health inspection. create runs the deployer env-pack's QASpec for host_config + bound-slot answers. |
gtc op env-packs add|update|remove|list | Attach / detach env-packs. Runs that env-pack's QASpec. |
gtc op bundles add|update|remove|list | Add / update / remove application bundles. add stages a new Revision. |
gtc op traffic set|show|rollback | Manage TrafficSplit per deployment_id; --bundle only when it resolves to exactly one deployment. |
gtc op credentials requirements|bootstrap|rotate | Two-mode credentials (P5). |
gtc op secrets list|put|get|rotate | Secrets management via the Env's secrets env-pack. |
gtc op config show|set | Inspect host_config / setup answers / runtime values; mutate host_config. |
gtc op revisions stage|warm|drain|archive|list | Revision lifecycle on a bundle. |
gtc start [--bundle <path>] [--env local] — local one-shot for the developer inner loop. Internally a thin wrapper over gtc op env create local (idempotent, default env-packs) + gtc op bundles add local <bundle> + gtc op revisions warm + gtc op traffic set local <bundle>:<latest>=100.
P2Environment-pack vs application-bundle (the right axis)
Two artifact axes, not one. Both share the .gtpack format; they differ by role, not archive shape.
| Role | What it is | Lifecycle | Examples |
|---|---|---|---|
| Environment-pack | A capability provider an Environment binds to. Declares a capability slot, a config schema, a wizard QASpec, a credentials schema (deployer only), and an implementation. Native Rust handler today; WASM/extension later. | Attached to Environment.packs by reference. Re-bind without redeploying workload bundles. Versions per env. | greentic.secrets.aws-sm, greentic.secrets.azure-kv, greentic.deployer.aws-ecs, greentic.deployer.k8s, greentic.telemetry.otlp, greentic.sessions.redis. |
| Application bundle | A workload .gtbundle (SquashFS) of one or more app packs. Composes flows, components, routes. The artifact deployed into an Environment. | Pushed via gtc op bundles add. Creates or updates a BundleDeployment; each deployment owns Revisions and a TrafficSplit. | fast2flow, llm-router, RAG, customer.support-bot, webhook-receiver. |
- A customer wants to swap the secrets backend (dev-store → AWS-SM) without redeploying every workload bundle.
- A customer wants one OTLP collector shared across all bundles in
prod-eu.
Multi-bundle and multi-customer environments are first-class. An Environment hosts N application bundles concurrently, and the same bundle_id may appear in more than one BundleDeployment for different customers. The HTTP route table resolves a deployment_id from tenant/team/customer/host/path routing, then dispatches to (deployment_id, bundle_id, revision_id).
P3Deploy version ≠ Shift traffic (Revision + TrafficSplit, in-process router)
Two separate verbs
gtc op revisions stage— creates a Revision at 0% under onedeployment_id(never inTrafficSplityet).gtc op revisions warm— advancesinactive → staged → warming → readyafter the cold start finishes.gtc op traffic set <env> --deployment <id> <revision>=<percent> [...]— updates the split atomically;--bundle <id>is a shorthand only when it resolves to exactly one deployment.
The CLI accepts percentages, but the persisted object stores basis points so canaries smaller than 1% are possible later. Mutating commands carry an idempotency key and an expected generation; stale callers get a conflict instead of overwriting a newer rollout.
In-process RevisionDispatcher — per-request dispatch order
- Resolve
deployment_idfrom authenticated tenant/team/customer context plus route binding. Public traffic cannot choose an arbitrary deployment. - Headers
X-Greentic-Deployment+X-Greentic-Revision— exact match, but only for trusted admin/test traffic (mTLS, authenticated admin session, or signed debug token). - Cookie
_gt_rev_<deployment_id>=<signed-revision>— sticky binding, scoped per deployment. HMAC-signed over{env_id, tenant, deployment_id, revision_id, generation, expires_at}. Unsigned, expired, unknown, or not-ready revisions are ignored. - Weighted random over the deployment's
TrafficSplit.entriesusing cumulative basis points (0..10_000).
Rollout primitives map onto distributor-client, but the spec owns the lifecycle
| New verb | Distributor-client call | Effect |
|---|---|---|
revisions stage | stage_bundle | Creates Revision + cache record; inactive → staged; not in TrafficSplit yet. |
revisions warm | warm_bundle + runner warm API | staged → warming → ready. Allocates runtime resources and builds route tables. |
traffic set | (none — TrafficSplit object only) | Updates one deployment's weight table atomically; no state-machine change. |
revisions drain | validated state write | Stops accepting new sessions; finishes in-flight; tears down. |
revisions archive | validated state write | Removes from TrafficSplit if present; frees cache. |
traffic rollback | (none) | Snapshot previous TrafficSplit; atomic swap back. |
TrafficSplit as authoritative. If a provider cannot route all requests through a process that has all ready revisions loaded, it must use a provider-native weighted router and report drift instead of pretending per-revision pods can re-route traffic they never receive.
P4Wizards keep their homes; env-scope them surgically (reduced)
The four wizard runtimes (greentic-bundle/src/wizard, greentic-setup/src/cli_commands/setup.rs, greentic-operator/src/wizard.rs, greentic-dw/crates/greentic-dw-cli/src/wizard.rs) stay. The big-bang unification into a single greentic-wizard-engine is deferred — too much surface to change at once for too little immediate benefit.
What this plan does change:
- Every wizard call gains an env-id parameter. Answers persist under
~/.greentic/environments/<env-id>/{env-packs|bundles}/<id>/answers.json. Secret-marked answers route through the env's secrets env-pack (Environment.packs[secrets]), never written to bundle setup-state. PersistedSetupState.secret_valuesplaintext goes away at the artifact boundary. Phase 0 excludes it from archive paths. Phase B replaces the in-memoryBTreeMap<String, Value>withBTreeMap<String, SecretRef>.- Env-pack contributes its own QASpec via the existing format. When the operator runs
gtc op env-packs add <kind>, the operator's existing wizard driver loads the env-pack's QASpec and runs it. No new runtime. - App bundles keep their own wizard. Env-id is propagated; secret answers route to the Env's secrets env-pack.
What is deferred (Phase E or a separate wizard plan):
- A single composed
greentic-wizard-enginecrate. - Per-app-pack
wizard.yamlcontribution + composition. - Adaptive Card surface as a first-class wizard renderer.
P5Credentials are first-class — minimum-requirements OR admin-bootstrap
Deployments need cloud credentials. Two facts shape the design:
- Customers' admins refuse to hand over full admin credentials for long-lived use.
- Assembling the minimum set of permissions by hand is painful enough that "credentials" is where most enterprise deployment friction lives.
| Mode | What the user provides | What Greentic does | Output |
|---|---|---|---|
| Requirements mode (preferred) |
Bare-minimum credentials the deployer's policy demands (e.g., a pre-existing IAM role ARN with actions listed by DeployOp::required_iam_actions). |
Validate the credentials against declared requirements: list missing actions, missing resources, expired sessions, wrong region/account. Refuse to proceed if validation fails. | Validated Credentials object stored in the env's secrets backend; deployment proceeds. |
| Admin-bootstrap mode | Full admin credentials, one-time, ephemeral. | Run a bootstrap pack that provisions the minimum IAM/RBAC resources (roles, policies, ServiceAccounts, secret-manager paths, KMS aliases). Prefer cloud-native federation / workload identity over static keys. Emit a rules pack — signed bundle of IaC (Terraform / kubectl YAML / Pulumi / Bicep) the admin can review and apply elsewhere. | Either (a) env stores the generated low-privilege credentials, or (b) admin applies the rules pack offline. Admin credentials are never intentionally persisted; buffers zeroized where possible. |
Each deployer env-pack ships credentials.yaml
schema: greentic.deployer-credentials.v1
deployer_kind: greentic.deployer.k8s@1.0.0
requirements:
- capability: k8s.create-namespace
check: { kubeconfig_action: namespaces/create }
- capability: k8s.bind-rbac
check: { kubeconfig_action: clusterrolebindings/create }
- capability: aws.assume-role
check: { sts_action: AssumeRole, resource: arn:aws:iam::*:role/greentic-* }
validation:
command: greentic-deploy-auth check --kind greentic.deployer.k8s
expected_exit: 0
bootstrap:
kind: terraform
module: bootstrap/k8s
inputs_schema: bootstrap/inputs.schema.json
rules_export:
formats: [terraform, helm-values, kubectl-yaml]
output: rules/<env-id>/
gtc op credentials … is the operator surface
gtc op credentials requirements <env>— run validation; produce a concrete "missing X, fix with Y" report. Fail fast.gtc op credentials bootstrap <env> --admin-profile <name>— run the deployer env-pack's bootstrap pack against ephemeral admin credentials; persist low-privilege output; render rules pack underrules/<env-id>/.gtc op credentials rotate <env>— re-validate; rotate session tokens; warn on near-expiry.
P6Deployed bundles are addressable at the usage level; revenue share is per customer
- Usage metering per deployment per customer. Every invocation, every flow run, every minute of warm capacity carries
(env_id, deployment_id, bundle_id, revision_id, tenant, team, customer_id). - Service Agency revenue share. Bima-style reselling and consulting engagements need a per-customer billing percentage attached to a deployed bundle. Example: customer C buys bundle
customer.supporton envprod-eu; agency A gets 30% of usage revenue, Greentic gets 70%.
schema: greentic.bundle-deployment.v1
deployment_id: 01JTKS_F2F # ULID — per (env_id, bundle_id, customer_id)
env_id: prod-eu
bundle_id: customer.support
customer_id: cust-acme # billing principal — required for non-local envs
route_binding:
hosts: [support.acme.example.com]
path_prefixes: [/]
tenant_selector: { tenant: acme, team: support }
current_revisions: [01JTKR_v2, 01JTKR_v1]
revenue_share:
- { party_id: agency-a, basis_points: 3000 }
- { party_id: greentic, basis_points: 7000 }
revenue_policy_ref: billing-policies/customer.support/cust-acme/v1.json.sig
usage:
meter_endpoint: usage://prod-eu/cust-acme/customer.support
last_seen_at: 2026-05-14T08:01:00Z
authorization_ref: audit/2026-05-14T08-00-00Z-deploy-create.json
GREENTIC-BIZ/greentic-billing, TBD).
3. Today's reality — only what bears on the restructure
| Area | Finding | Reference |
|---|---|---|
| Partial Environment types | EnvironmentConfig and store Environment exist, but neither models deploy provider/secrets/telemetry/revisions/traffic. | greentic-config-types/src/lib.rs:59 |
.dev.secrets.env leak | Both squashfs and zip archive paths walk the bundle dir with no exclusion; persist_all_config_as_secrets writes ALL visible answers to the dev store; PersistedSetupState.secret_values carries plaintext into setup-state files. | greentic-setup/src/gtbundle.rs:85-99, :136-170, :394-427 |
| Config currently rides the secrets path | persist_all_config_as_secrets is intentional in several paths because runtime/component config lookup depends on the secrets channel. Phase C must add non-secret runtime config delivery before deleting all non-secret writes. | greentic-setup/src/qa/persist.rs |
| Single-active-bundle model | AdminState { bundle_root, tenant, team } — one bundle per (tenant, team). Reserved /deployments/{stage,warm,activate,rollback,complete-drain} endpoints have NO handlers. | greentic-operator/src/admin_api.rs:20-26 |
| Runner already pack-native | RunnerHost::load_pack() accepts a .gtpack directly; TenantRuntime swaps via ArcSwap. | greentic-runner-host/src/host.rs:155 |
| Env hook in pack manifest | DistributionSection.environment_ref and desired_state_version exist but are unused. | greentic-pack/src/builder.rs:151-162 |
| Distributor-client primitives | stage_bundle:1902, warm_bundle:1960, rollback_bundle:2030, set_bundle_state:2124, evaluate_retention, apply_retention — none called by deployer/operator today. Writes are NOT atomic (bare fs::write at :3315). | greentic-distributor-client/src/dist.rs:470-477 |
| Sessions are env-scoped | Session keys: {env}::{tenant}::{pack_id}::{flow_id}::{session_hint}. | greentic-runner-host/src/engine/host.rs:28-54 |
| NATS subjects | greentic.messaging.ingress.{env}.{tenant}.{team}.{platform} — no version segment. Confirmed safe to leave alone. | greentic-messaging/libs/core/src/messaging_subjects.rs:5 |
| Public route table | Built once per-tenant from active bundle's pack manifests. In-memory. No persistence; no per-revision split today. | greentic-start/src/http_routes.rs:48-109 |
| Cloud coupling in host | ~4,126 LOC of cloud-specific code in greentic-deployer (AWS/Azure/GCP). gtc has 10+ default_value("aws") admin subcommands. | Phase 1 audit |
| Distributor-client signature verifier | Explicit no-op: "signature verification is not implemented in the open-source client." | dist.rs:3956 |
| SquashFS extraction split | greentic-setup already uses backhand. greentic-start still shells out to unsquashfs; C4 targets greentic-start. | greentic-start/src/bundle_ref.rs:341 |
| Wizard runtimes | 4 independent runtimes; none env-aware; all use PersistedSetupState.secret_values plaintext. | Phase 1 audit |
4. Locked-in decisions
- In-process
RevisionDispatcheris the Greentic load balancer. Authoritative in local, single-VM, and the K8s/router gateway process. A provider that cannot put all ready revisions behind that dispatcher must use provider-native weighted routing and report drift against deployment-scopedTrafficSplit. - Environment is the deploy target. It owns env-packs (by slot) and app bundles. App bundles never embed env config. Multiple bundles and multiple customer deployments of the same bundle per Env are first-class. Per-pack rollout inside a bundle remains Phase E.
- Wizards keep their homes; env-scoping is surgical. No
greentic-wizard-engineextraction in this plan; no per-app-packwizard.yamlcomposition. - Credentials are a first-class pillar with two modes (P5). Admin credentials are never intentionally persisted; bootstrap paths prefer cloud-native federation / workload identity.
- Provider taxonomy is extensible by namespaced descriptor, not enum. The closed surface is the small fixed list of capability slots.
- Env config is split into three sources: host / setup / runtime.
- Deployed bundles are addressable at usage level with per-customer revenue share (P6). Telemetry stamps
deployment_idandcustomer_id; metrics use curated labels or OpenTelemetry views. localis implicit;devmigrates behind a preflight gate.gtc opis the operator wizard surface;gtc startis the local one-shot. The dozens ofdefault_value("aws")admin sub-commands go away.- Restructure dissolves the AWS-first chapter into a generic Revision/TrafficSplit model.
- Cross-cutting workstreams remain provider-independent. C1 → Phase 0 (security fix). C2 covers pack signatures plus DSSE. C3/C4 stay Phase A.
- Minimum trust is in this plan; advanced trust is deferred to
plans/greentic-trust-and-airgap.md. - Deployment extension taxonomy is fixed before Phase D. V1 execution uses native deployer packs loaded by
greentic-deployer;.gtxpackdeployer extensions remain metadata wrappers withexecution.kind = builtin. - Operator delegates to
greentic-startvia library link. No subprocess. - Lifecycle endpoints implement INTO the reserved route prefixes at
greentic-operator/src/static_routes.rs:65-69. - Distributor-client
set_bundle_stategains atomic write + transition-validation matrix. Lifecycle includesfailedandarchived. - Non-secret config gets its own runtime channel. The current "write all answers to secrets" behavior is a migration bridge, not a target design.
- All mutating deployment writes are optimistic-concurrency controlled and idempotent.
- Mutating deployment operations are audited and authorized. Non-local production paths require RBAC before Phase D.
- Observability is per revision and per deployed bundle, with customer attribution, but metrics avoid unbounded cardinality.
- Production EnvironmentStore is a production system. Local files are for
localand development. - Env-pack changes are revisioned and rollbackable. Rebinding a slot creates an env-pack binding generation with previous-binding metadata.
5. Target object model (spec-level)
5.1 greentic.environment.v1
Decomposes into three persistence units that map to the three config sources:
environment.json— top-level env identity, bound env-packs, bundles deployed, revisions, traffic splits,host_config. Operator-owned.env-packs/<slot>/answers.json— non-secret setup answers per env-pack. Wizard-owned.runtime.json— discovered runtime values. Deployer env-pack-owned.
// greentic-deploy-spec
pub struct Environment {
pub schema_version: SemVer,
pub environment_id: EnvId,
pub name: String,
pub host_config: EnvironmentHostConfig, // moved from greentic-config-types
pub packs: Vec<EnvPackBinding>, // one entry per CapabilitySlot
pub credentials_ref: Option<SecretRef>, // P5; points into Environment.packs[secrets]
pub bundles: Vec<BundleDeployment>, // §5.4; multiple per env
pub revisions: Vec<Revision>, // §5.2; flat list, indexed by id
pub traffic_splits: Vec<TrafficSplit>, // §5.3; one per deployment_id
pub revocation: RevocationConfig,
pub retention: RetentionPolicy,
pub health: HealthStatus,
}
pub struct EnvPackBinding {
pub slot: CapabilitySlot, // closed enum: Deployer | Secrets | Telemetry | Sessions | State | Revocation
pub kind: PackDescriptor, // namespaced descriptor — open, e.g. "greentic.deployer.k8s@1.0.0"
pub pack_ref: PackId,
pub answers_ref: Option<PathBuf>,
pub generation: u64, // bumped on attach/update/remove/rollback
pub previous_binding_ref: Option<PathBuf>,
}
pub struct PackDescriptor(pub String); // "<namespace>.<id>@<semver>" — never enum-matched
pub enum CapabilitySlot {
Deployer, Secrets, Telemetry, Sessions, State, Revocation,
}
5.2 greentic.revision.v1
Revisions are per BundleDeployment. Each customer-scoped deployment has its own revision sequence and its own TrafficSplit.
schema: greentic.revision.v1
revision_id: 01JTKR9X7ZQK3FN5W3TX9CHEAM # ULID — unique within Env
env_id: prod-eu
bundle_id: customer.support
deployment_id: 01JTKS_CS_ACME # references BundleDeployment
sequence: 42 # monotonic per deployment_id
created_at: 2026-05-14T08:00:00Z
bundle_digest: sha256:abc... # digest of the .gtbundle archive
pack_list:
- pack_id: customer.support.flows
version: 1.2.0
digest: sha256:abc...
source_uri: oci://ghcr.io/greentic-biz/customer/packs/support-flows:1.2.0
pack_list_lock_ref: revisions/01JTKR.../PackList.lock
config_digest: sha256:ghi...
signature_sidecar_ref: revisions/01JTKR.../revision.sig
lifecycle: ready
staged_at: 2026-05-14T08:00:00Z
warmed_at: 2026-05-14T08:01:14Z
drain_seconds: 60
State-transition matrix
5.3 greentic.traffic-split.v1
One TrafficSplit per deployment_id. Splitting traffic between revisions of customer A's customer.support is independent of customer B's deployment of the same bundle, and independent of llm-router.
schema: greentic.traffic-split.v1
env_id: prod-eu
deployment_id: 01JTKS_CS_ACME
bundle_id: customer.support
generation: 17
entries:
- { revision_id: 01JTKR9X..., weight_bps: 100 }
- { revision_id: 01JTKR8W..., weight_bps: 9900 }
updated_at: 2026-05-14T09:30:00Z
updated_by: operator://prod-eu/api
idempotency_key: 01JTKW5B4W4Q5Y1CQW93F7S5VH
authorization_ref: audit/2026-05-14T09-30-00Z-01JTKW5B.json
previous_split_ref: traffic-splits/01JTKS_CS_ACME/2026-05-14T09-25-00.json
Constraints
- Sum of
weight_bpsMUST equal 10,000. - All
revision_ids MUST resolve to aRevisionwith matchingdeployment_id+bundle_idandlifecycle == Ready. - Update is atomic (write-temp + rename + generation check;
ArcSwapfor in-memory). - Mutating requests MUST carry an idempotency key. Stale
generationfails with conflict. - Mutating requests MUST pass authorization and write an audit event before the state file is swapped.
- Previous split is kept under
traffic-splits/<deployment_id>/<ts>.jsonfor last-N rollbacks.
5.4 greentic.bundle-deployment.v1
The usage-level anchor (P6). One per (env_id, bundle_id, customer_id). Required for non-local envs. (See schema in §P6 pillar.)
Constraints
sum(revenue_share[*].basis_points) == 10_000.customer_idis required for any env wherehost_config.tenant_org_idis set (i.e., notlocal). Forlocal, defaults tolocal-dev.route_bindingMUST resolve public traffic to exactly onedeployment_id. Ambiguous bindings fail at deploy time.revenue_policy_refpoints to a signed, versioned policy document; mutatingrevenue_sharecreates a new policy version.- Status transitions:
active → paused | archived;paused → active | archived;archivedis terminal.
5.5 greentic.credentials.v1 (P5)
schema: greentic.credentials.v1
env_id: prod-eu
deployer_kind: greentic.deployer.k8s@1.0.0
mode: requirements # requirements | bootstrap
provided_credentials_ref: secret://prod-eu/credentials/k8s-sa-token
validation:
last_run_at: 2026-05-14T07:30:00Z
result: pass # pass | fail
missing_capabilities: []
bootstrap:
admin_credential_consumed_at: 2026-05-14T07:00:00Z
rules_pack_ref: rules/prod-eu/k8s-min-permissions.gtpack
generated_credentials_ref: secret://prod-eu/credentials/k8s-sa-token
expiry:
expires_at: 2026-06-14T07:00:00Z
rotate_at: 2026-06-07T07:00:00Z
5.6 greentic.pack-config.v1 (per-revision, per-pack)
schema: greentic.pack-config.v1
pack_id: customer.support.flows
revision_id: 01JTKR9X...
non_secret:
default_locale: en-GB
webhook_base_url: https://prod-eu.example.com
secret_refs:
bot_token: secret://prod-eu/customer.support/telegram/bot_token
runtime_refs:
alb_dns: runtime://prod-eu/discovered/alb_dns
Three address spaces in one schema: non_secret (inline), secret_refs (resolved through Environment.packs[secrets]), runtime_refs (resolved through Environment.runtime.discovered). Runtime resolves secret_refs and runtime_refs lazily; values never embedded in the bundle.
6. Phase plan
One security hotfix plus four product phases. Each phase is independently shippable. Cross-cutting workstreams C1-C5 run in parallel and gate Phase ends.
- Keep phases shippable, but avoid 4-6 week mega-PRs. Each table row is a PR-sized gate with its own tests and rollback notes.
- Non-local deploys stay disabled until artifact verification, RBAC, audit, idempotency, and a production-capable EnvironmentStore are in place.
- CLI examples may use
--bundleas shorthand only when the deployment is unique. Production runbooks should use--deployment <deployment_id>. - Any ABI change in
greentic-interfacesis a separate compatibility gate with version negotiation and mixed-version tests.
Phase 0 Security hotfix before the model rewrite (1 week)
Goal: stop leaking dev secrets into generated artifacts before large refactors begin.
| PR | Scope | Repo |
|---|---|---|
| P0.1 | Exclude .greentic/dev/.dev.secrets.env, setup-state plaintext secret payloads from both SquashFS and ZIP bundle paths. Prefer explicit archive allowlist. | greentic-setup/src/gtbundle.rs, greentic-bundle/src/setup/mod.rs |
| P0.2 | Add greentic-bundle doctor secrets and CI grep gate for .dev.secrets.env, secret_values, known test plaintext. | greentic-bundle/src/build/doctor_secrets.rs, CI |
| P0.3 | Preserve compatibility by writing non-secret config where current runtime expects it, but mark this path deprecated. | greentic-setup, greentic-operator, greentic-start |
| P0.4 | Harden bundle extraction path validation for ZIP and SquashFS. Reject absolute paths, .., symlinks escaping the extract root, hardlinks, duplicate normalized paths. | greentic-start/src/bundle_ref.rs, greentic-setup/src/gtbundle.rs |
- Built bundles contain zero plaintext dev secret files and no serialized
secret_values. - Existing local setup still runs because non-secret config compatibility remains in place.
- CI fails on a fixture that intentionally tries to archive
.greentic/dev/.dev.secrets.env. - Extraction tests reject a malicious archive with path traversal or root-escaping symlinks.
Phase A Foundations (4–6 weeks, delivered as narrow PR gates)
Goal: Environment and Revision types exist with host/setup/runtime config split and env-pack bindings; local Environment auto-created on gtc setup; legacy dev migration has a preflight gate; the existing gtc op passthrough exposes the operator surface; gtc start is the local shortcut; C3 (tool preflight) and C4 (distroless/MUSL) ship.
| PR | Scope | Repo |
|---|---|---|
| A1 | New crate greentic-deploy-spec: schemas for environment, runtime, revision, traffic-split, bundle-deployment, credentials + JSON-schema gen. EnvPackBinding with closed CapabilitySlot enum and open PackDescriptor string. | new greentic-deploy-spec |
| A2 | EnvironmentStore trait with LocalFsStore impl. Atomic write helper. Per-env flock. Local backups before mutation. | greentic-deployer/src/environment/ |
| A3 | Extend existing gtc op passthrough; implement env / env-packs / bundles / revisions / traffic / config / credentials / secrets sub-commands. All honor --schema/--answers. Refactor gtc start. Drop 10+ default_value("aws") admin subcommand args. | greentic/src/bin/gtc/, greentic-deployer/src/cli.rs, greentic-operator/src/cli.rs |
| A4 | local Environment auto-create on first gtc setup. Default env-pack bindings: local-process + dev-store + stdout + in-memory + in-memory. | greentic-setup/src/cli_commands/setup.rs, greentic-config |
| A4b | Guarded dev → local migration. --check scans legacy dev-store keys, env strings, audit labels. --apply rewrites if safe; otherwise alias path with warning telemetry. | greentic-setup, greentic-config, greentic-start, greentic-operator |
| A5 | Distributor-client atomic-write wrapper + state-transition validation matrix. Extend enum for failed + archived. | greentic-distributor-client/src/dist.rs |
| A6 | EnvironmentStore migration from legacy ~/.greentic/state/deploy/<provider>/... to env-pack-bound layout. One-shot, fails loud on residue. | greentic-deployer/src/environment/migration.rs |
| A7 | Append-only audit log and local authorization policy for every mutating gtc op command. Non-local mutations fail closed until RBAC. | greentic-deployer/src/environment/audit.rs |
| A8 | Remote EnvironmentStore HTTP contract: ETag/generation CAS, idempotency replay, RBAC decision, audit-event response shape, backup/restore contract. | greentic-deploy-spec, greentic-operator |
| A9 | Env-pack registry: pack-store lookup by PackDescriptor. Built-in registrations for local-process + dev-store + stdout + in-memory. | greentic-deployer/src/env_packs/ |
| A10 | Wizard env-scoping (surgical, no extraction). Add env_id: EnvId parameter to all four existing wizard runtimes and to greentic-qa-lib::WizardDriver. | wizard runtimes + greentic-qa |
| C3 | Tool preflight (cross-cutting). Verify versions, credentials, region/cluster access, auth scope. | greentic-deployer/src/tool_check.rs |
| C4 | Distroless + MUSL (cross-cutting). gcr.io/distroless/static-debian12:nonroot. backhand replaces unsquashfs shell-out in greentic-start. | greentic-start/Dockerfile.distroless |
gtc setupon a clean machine creates~/.greentic/environments/local/environment.jsonwith the 5 default env-packs bound.gtc start ./my-bundle.gtbundlebrings up a local env and deploys the bundle in one command (idempotent on re-run).gtc op env-packs list localshows the 5 bound packs bykind.gtc op env migrate-dev local --checkreports safe/unsafe with concrete references.gtc op bundles add local ./my-bundle.gtbundlecreates a Revision in stateStaged.gtc op revisions warm local <ULID>advances toReady.gtc op --schema env-packs addprints a valid QASpec;--answers prior.jsonruns non-interactively.- Built bundles contain zero plaintext secret values (CI grep gate).
- Every host container image runs as
uid=65532, < 30MB, no shell. - Every mutating
gtc opwrites an audit event with actor, env, target, generations, idempotency key, result. - Adding a new env-pack
kinddoes not require modifying any closed enum ingreentic-deploy-spec.
Phase B Runtime config, multi-bundle env, in-process traffic splitter, usage-stamping (4–6 weeks)
Goal: greentic-start boots from runtime-config.v1 with multiple deployments, bundles, and revisions; RevisionDispatcher ships; gtc op traffic set works per deployment_id; reserved /deployments/* endpoints get handlers; running two revisions of the same customer deployment side-by-side in local with weighted split is demonstrably real; BundleDeployment and revenue-share are stamped on telemetry.
| PR | Scope | Repo |
|---|---|---|
| B0 | runtime-config.v1 loader supporting revisions: Vec<RevisionRuntimeBlock> (multiple deployments and bundles in one env). | greentic-start/src/runtime_config.rs |
| B1 | RevisionDispatcher module — route binding resolves deployment_id; basis-point weight selection per deployment; trusted header override; HMAC-signed cookie stickiness; session pin lookup. | greentic-start/src/revision_dispatcher.rs (new) |
| B2 | ActivePacks extended from HashMap<tenant, TenantRuntime> to HashMap<(tenant, deployment_id, bundle_id, revision_id), TenantRuntime>. Add load_revision(...). | greentic-runner-host/src/runtime.rs, host.rs |
| B3 | HTTP route table extended per (deployment_id, bundle_id, revision_id); ingress dispatch consults RevisionDispatcher. | greentic-start/src/http_routes.rs, http_ingress/mod.rs |
| B4 | Operator handlers for reserved endpoints: POST /deployments/stage, /warm, /activate, /rollback, /complete-drain. All scoped by (env, deployment_id). | greentic-operator/src/admin_api.rs, lifting handlers into reserved prefixes at src/static_routes.rs:65-69 |
| B5 | gtc op traffic {set,show,rollback} per deployment; persists TrafficSplit under ~/.greentic/environments/<env>/traffic-splits/<deployment_id>/. | greentic-deployer/src/cli.rs, new src/environment/traffic.rs |
| B6 | Redis-backed session-pin hash gt:rev_pin:{env}:{deployment_id}:{tenant}. In-memory fallback for local. | greentic-runner-host/src/engine/host.rs |
| B7 | gtc op revisions drain semantics: stop new session pins; wait drain_seconds; let HTTP finish; close remaining WebSockets with retryable close code; tear down TenantRuntime. | greentic-start/src/revision_dispatcher.rs |
| B8 | Static route table extended per (deployment_id, bundle_id, revision_id). | greentic-operator/src/static_routes.rs |
| B9 | Warm/ready health gate: validate route table, runtime config, signature status, provider health before Ready. Manual abort/rollback on failed warm; metric-driven abort = Phase E. | greentic-start, greentic-deployer/src/environment/lifecycle.rs |
| B10 | BundleDeployment lifecycle. gtc op bundles add creates a BundleDeployment per (env, bundle_id, customer_id). customer_id required for non-local. revenue_share defaults to [{greentic, 10000}]. | greentic-deployer/src/environment/bundle_deployment.rs |
| B11 | Telemetry stamping. greentic-telemetry::Context extended with customer_id, deployment_id, bundle_id, revision_id. Metrics use documented low-cardinality subset or OTel views. | greentic-telemetry, greentic-runner-host |
| B12 | Replace secret_values: BTreeMap<String, Value> with secret_refs: BTreeMap<String, SecretRef> in PersistedSetupState. | greentic-bundle/src/setup/mod.rs + all qa_persist callers |
| C2 | Artifact signing + distributor-client verifier (cross-cutting). Add DSSE sidecar; replace distributor-client no-op verifier. Non-local stage rejects unsigned or untrusted bundles. | greentic-bundle/src/build/signing.rs (new), greentic-distributor-client/src/dist.rs |
| C5 | Revision + bundle + customer observability (cross-cutting). Attach all rollout identifiers on logs/traces/events. OTel views control metric cardinality. | greentic-telemetry, greentic-start, greentic-runner-host |
gtc start fast2flow.gtbundle && gtc op bundles add local llm.gtbundle && gtc op bundles add local RAG.gtbundleresults in three deployments running inlocal, each with its own Revision and TrafficSplit.- For one deployment: stage v2, warm, then
gtc op traffic set local --deployment <dep> <r1>=99 <r2>=1results in 1% of HTTP traffic to that deployment landing on v2 and 99% on v1, with other deployments unaffected. - Two customers can deploy the same
bundle_idin one env; each gets a distinctdeployment_id, revision sequence, route binding, and TrafficSplit. X-Greentic-Deployment+X-Greentic-Revisionpins authenticated admin/test requests to that exact deployment+revision; the same headers from unauthenticated public traffic are ignored.- Cookie-stickiness keeps a session on the revision it first hit per deployment, uses an HMAC-signed value, and sets
Secure,HttpOnly,SameSite=Lax, scoped path, and boundedMax-Age. - Concurrent
gtc op traffic setwith the same idempotency key returns the original result; stale generation fails with conflict. - E2E test: deploy two revisions of one deployment with different
/healthpayloads, send 1000 requests, observe 990±20 on v1 and 10±20 on v2 (chi-squared at p=0.95). greentic-bundle sign --key-ref pkcs8://release.pemproduces a valid sidecar; stage rejects bundles without a valid signature.- Telemetry carries
env_id,customer_id,deployment_id,bundle_id,revision_id,pack_id, and rollout generation on spans/logs. PersistedSetupState.secret_valuesfield is gone from the artifact; onlysecret_refsremain.
Phase C Credentials (P5), runtime config channel, env-pack QASpec attachment (3–4 weeks)
Goal: gtc op credentials … two-mode flow works for at least one deployer env-pack (local-process + a stub for k8s); pack-config.v1.non_secret is the canonical channel for non-secret runtime config; env-packs contribute their own QASpec to the existing operator wizard.
| PR | Scope | Repo |
|---|---|---|
| C1 | greentic.credentials.v1 schema + gtc op credentials CLI. Implements requirements and bootstrap flows. Admin credentials not written to disk; buffers zeroized where possible. | greentic-deploy-spec, greentic-deployer/src/credentials/ |
| C2 | Local-process deployer credentials — trivial requirements (writable ~/.greentic, available port range). Reference implementation. | greentic-deployer/src/env_packs/local_process/credentials.rs |
| C3 | K8s deployer credentials stub — full credentials.yaml; requirements validates via SelfSubjectAccessReview where possible; bootstrap renders a Terraform module + kubectl YAML rules pack. | greentic-deployer/src/env_packs/k8s/credentials.rs |
| C4 | pack-config.v1.non_secret runtime channel. Components read non-secret config through new RuntimeConfigReader host import; secrets through existing SecretsManager. ABI change: version negotiation + mixed-version tests + one-release compatibility shim. | greentic-runner-host/src/engine/host.rs, greentic-interfaces |
| C5 | runtime:// resolver — pack-config.v1.runtime_refs resolved through EnvironmentRuntime.discovered. Hot-reload on runtime.json write. | greentic-runner-host, greentic-start/src/runtime_config.rs |
| C6 | Env-pack QASpec attachment. Each env-pack ships wizard.qaspec.yaml. gtc op env-packs add <kind> loads that QASpec through the operator's existing wizard driver. No new engine. | greentic-operator/src/wizard.rs |
| C7 | App-bundle wizard env-scoping (extended from A10) writes secret_refs and non_secret per the new channel. | greentic-bundle/src/wizard, qa_persist callers |
gtc op credentials requirements localreturns green for a defaultlocalenv.gtc op credentials bootstrap stg-k8s --admin-profile zain-admin(against a kind cluster) produces a low-privilege ServiceAccount token + a rules-pack underrules/stg-k8s/; admin token is not in any persisted file.- A reference app pack reads a non-secret URL via
RuntimeConfigReaderand a secret token viaSecretsManager; bundle contains no plaintext. - Mixed-version ABI tests pass: old components still run through the compatibility shim.
- A reference component reads
runtime://prod-eu/discovered/alb_dnsand resolves viaruntime.json. gtc op env-packs add local --kind greentic.telemetry.otlpruns that env-pack's QASpec via the operator's existing wizard driver (no new engine).- The four legacy wizards still work (env-scoped from A10) and now write
secret_refs+non_secretcleanly.
Phase D Deployer env-pack rollout (AWS proving ground, K8s parallel design)
Goal: each supported deployer is shipped as a real env-pack registered through the env-pack registry (A9). The closed surface is CapabilitySlot::Deployer; the open surface is each pack's PackDescriptor. Rollout order: AWS → K8s (Zain target) → GCP → Azure → Snap/Juju, but K8s contract tests and manifest output are not blocked on AWS E2E completion.
For each deployer env-pack, the same shape:
- Production EnvironmentStore before production deploys. AWS/K8s acceptance requires a non-local store with HA, CAS, idempotency replay, RBAC, audit, backup/restore, corruption detection.
- Conformance suite first. Deployer contract:
apply,stage / warm / drain / archive,apply_traffic_splitperdeployment_id,report_runtime_config,validate_credentials,bootstrap_credentials,render_manifests,preflight,audit_event. - Extract cloud-specific Terraform / scripts / SDK calls into the deployer env-pack.
- Implement
revisions stage / warm / drainagainst the provider's primitives:- AWS = ECS task-set per revision behind a single ECS service.
- K8s = one stable Greentic router Deployment receives ingress and runs
RevisionDispatcher; one worker Deployment per revision exposes a ClusterIP Service. Revision labelgreentic.ai/revision: <ULID>. Avoids the invalid "one Service randomly selects a revision pod, then that pod re-routes" model. - GCP = Cloud Run revision (native).
- Azure = Container Apps revision (native).
- Single-VM = systemd service per revision on per-revision port.
- Implement
apply_traffic_splitmirror against the provider's LB:- AWS =
aws_lb_listener_ruleweighted target groups (one TG per revision). - K8s = router Deployment is authoritative for Zain v1. Optional provider-native mirror via Gateway API / Istio / NGINX canary.
- GCP Cloud Run =
traffic_targets(native). - Azure = traffic-label revisions.
- Single-VM = in-process dispatcher only.
- AWS =
- Implement P5 credentials contract.
deploy-auth-<provider>against theDeployAuthProvidertrait. AWS first (STS, declarative IAM gen). K8s second (kubeconfig + SA token; IRSA / Workload Identity). - Telemetry env-pack per provider (
greentic.telemetry.aws-xray,greentic.telemetry.otlp, etc.). - Secrets env-pack per provider. K8s = External Secrets Operator (ESO) env-pack wiring to AWS-SM (via IRSA), Azure-KV, GCP-SM, or Vault.
- Production deployers avoid arbitrary shell handoff for mutable operations. Terraform/OpenTofu plan/apply may remain a controlled subprocess with checksummed generated files.
- K8s deployer pack renders declarative manifests as an artifact (
gtc op env render zain-prod --output ./rendered/) in addition to applying them. Full GitOps reconciliation = Phase E. - K8s hardening: Restricted Pod Security, non-root SecurityContext,
allowPrivilegeEscalation: false, read-only root filesystem, seccomp, resource requests/limits, NetworkPolicy, digest-pinned images, topology spread, router HPA, at least two router replicas, PDB, fail-closed on no valid TrafficSplit generation. report_runtime_configwritesruntime.jsonwith discovered values (ALB DNS, generated secret ARNs, K8s Service ClusterIPs).
- Non-local Environment state is stored in the production EnvironmentStore; backup and restore are exercised in the E2E run.
- RBAC denies an unauthorized
traffic set; audit records the policy decision. gtc op env create prod-eu+ env-packs add (aws-ecs + aws-sm + aws-xray) provisions IAM + ECR + ALB after credentials pass.gtc op credentials requirements prod-eureturns green ORbootstrapemitsrules/prod-eu/aws-min-iam.tf.gtc op bundles add prod-eu customer.support_v1.2.0.gtbundle --customer-id cust-acme --revenue-share agency-a:30%,greentic:70%creates theBundleDeployment.traffic set prod-eu --deployment <id> <ULID>=1 <prev>=99shifts 1% of ALB traffic to v1.2.0 within 2 minutes.- Secret rotation in AWS-SM triggers a graceful revision restart within 60s.
- Invocation telemetry carries
customer_id=cust-acme,deployment_id=<ULID>, revenue-share-eligible fields on spans/logs/usage events. runtime.jsonis populated withalb_dns,ecs_cluster_arn,task_set_ids;runtime://prod-eu/discovered/alb_dnsresolves correctly.- Zero
dev.secrets.envin any uploaded S3 artifact. Bundle signature verified at stage time. Image runs asuid=65532.
- Non-local Environment state is stored in the production EnvironmentStore; restore from backup produces the same active generation.
gtc op env create zain-prod+env-packs add zain-prod --slot deployer --kind greentic.deployer.k8s@1.0.0creates namespace, RBAC, router Deployment, per-revision worker Deployment template, Services, Ingress/Gateway, ESOClusterSecretStore, ServiceAccount with workload identity.gtc op env render zain-prod --output ./rendered/emits the same manifests without applying them.bundles add zain-prod customer.support_v1.2.0.gtbundle --customer-id cust-zain && revisions warm <ULID>creates a worker Deployment labeledgreentic.ai/revision=<ULID>; router runtime config includes it only after warm succeeds.traffic set zain-prod --deployment <id> <new>=1 <old>=99: 1% of inbound requests land on the new revision within 30s. 5000-request stratified test passes at p=0.95.- A second bundle
bundles add zain-prod llm-router_v1.0.0.gtbundle --customer-id cust-zaindeploys alongside without disturbing the first bundle's TrafficSplit. - ESO projects AWS-SM (or Azure-KV / Vault) into K8s Secrets; rotating a secret propagates to running pods within 60s.
kubectl rollout undois NOT the rollback path.gtc op traffic rollback zain-prod --deployment <id>reverts atomically; old Deployment still running and accepts traffic immediately.- Router readiness probes verify active TrafficSplit generation and route-table load. Fails closed if no valid generation is loaded.
- HPA scales each revision's worker Deployment independently; router Service remains stable.
- Pod specs pass the Restricted Pod Security profile (non-root, no privilege escalation, read-only root FS, seccomp, NetworkPolicy, digest-pinned images, topology spread, router PDB).
- Adding a hypothetical
greentic.deployer.k8s-istio@1.0.0is purely an env-pack publication — no closed enum in core changes.
7. Cross-cutting concerns (apply to all phases)
| ID | Description | Phase |
|---|---|---|
| C1 | Dev-secrets leak fix. Both greentic-setup archive paths, greentic-bundle/src/bundle_fs/*, setup-state plaintext redaction, malicious path/symlink tests, CI grep gate. Non-secret config compatibility remains until Phase C's runtime config channel lands. | Phase 0 |
| C2 | Artifact signing + verifier. Keep existing pack signing; add DSSE sidecar with in-toto Statement (predicate https://slsa.dev/provenance/v1, tlog_entry_id reserved). Plain PKCS8 keys for v1 only with explicit trust roots and key IDs. KMS, Rekor, full provenance policy → Trust plan. Distributor-client verifier gains real DSSE+Ed25519 verification. Non-local stage rejects unsigned or untrusted bundles. | Phase B |
| C3 | Tool preflight. Check versions, auth, region/cluster reachability, and required scopes for Terraform/OpenTofu, cloud CLIs, kubectl, helm, Docker/Podman. Prefer OpenTofu where possible. | Phase A |
| C4 | Distroless + MUSL + non-root. All host images use gcr.io/distroless/static-debian12:nonroot. Chainguard optional. MUSL static, USER 65532:65532. greentic-start switches from unsquashfs shell-out to backhand Rust crate. | Phase A |
| C5 | Revision + bundle + customer observability. Every rollout event, log, span records env_id, tenant, team, customer_id, deployment_id, bundle_id, revision_id, pack_id, env-pack kind, rollout generation. Metrics use documented lower-cardinality label set or OTel views. Emit rollout lifecycle events. | Phase B |
8. Critical files to create or modify
New files
- New crate
greentic-deploy-spec— ownsEnvironment,EnvironmentRuntime,Revision,TrafficSplit,BundleDeployment,Credentials,PackDescriptor,CapabilitySlot(Phase A) greentic-deployer/src/environment/{mod.rs, model.rs, store.rs, lifecycle.rs, atomic_write.rs, file_lock.rs, migration.rs, traffic.rs, bundle_deployment.rs}(Phase A/B)greentic-deployer/src/environment/{remote_store.rs, backup.rs}or operator-owned equivalent production EnvironmentStore (Phase D)greentic-deployer/src/environment/audit.rs(Phase A)greentic-deployer/src/env_packs/{registry.rs, slot.rs}(Phase A, A9)greentic-deployer/src/env_packs/local_process/{mod.rs, credentials.rs}(Phase A/C)greentic-deployer/src/env_packs/k8s/{mod.rs, credentials.rs, bootstrap/}(Phase C stubs → Phase D real)greentic-deployer/src/credentials/{mod.rs, validate.rs, bootstrap.rs, rules_export.rs}(Phase C, P5)greentic-start/src/{runtime_config.rs, revision_dispatcher.rs, revision_pin.rs}(Phase B)greentic-deployer/src/tool_check.rs(Phase A, C3)greentic-bundle/src/build/{signing.rs, doctor_secrets.rs}(Phase 0/B, C1/C2)greentic-setup/src/gtbundle/exclude.rs(Phase 0, C1)- New repo
greentic-deploy-auth(Phase D) - Deployer-env-pack conformance test suite (Phase D)
- Dockerfiles for
greentic-deployer,greentic-operator,greentic-bundle(Phase A, C4) - (Not in this plan)
greentic-wizard-engineextraction — deferred. The four existing wizard runtimes stay; they gain anenv_idparameter only.
Modified files (highlights)
greentic-distributor-client/src/dist.rs:3315— atomic writegreentic-distributor-client/src/dist.rs:3956— real DSSE verifiergreentic-config-types/src/lib.rs—EnvironmentConfigsplit:EnvironmentHostConfigstays; setup/runtime slices move togreentic-deploy-specgreentic-types/src/store.rs—Environmentbecomes a read-only compose viewgreentic-runner-host/src/runtime.rs:37—ActivePackskeyed by(tenant, deployment_id, bundle_id, revision_id)greentic-runner-host/src/host.rs:155— addload_revision(...); keepload_pack(...)compatibility helpergreentic-runner-host/src/engine/host.rs:28-54— session key shape gains deployment-scoped revision pins; telemetry context gainscustomer_id,deployment_idgreentic-operator/src/admin_api.rs— handlers for reserved/deployments/*endpoints, scoped by(env, deployment_id)greentic-operator/src/static_routes.rs:65-73— wire handlers into reserved prefixesgreentic-bundle/src/setup/{mod.rs:63-71, backend.rs}— replacesecret_valueswithsecret_refsgreentic-start/src/bundle_ref.rs:341-361—backhandnotunsquashfsgreentic/src/bin/gtc/deploy.rs:47-66— replaced bygtc op; cloud strings deletedgreentic-telemetry— context schema addscustomer_id,deployment_id,bundle_id,revision_id, env-packkind
Reuse, don't reinvent
greentic-distributor-client::{stage_bundle, warm_bundle, rollback_bundle, set_bundle_state, evaluate_retention, apply_retention}— wire in, don't duplicate.greentic-pack/crates/packc/src/signing/{signer.rs,verify.rs,canon.rs}— Ed25519 primitives.greentic-secrets-libproviders (AWS-SM, Azure-KV, GCP-SM, Vault) — already implemented; unused at deploy time today. Plug in as secrets env-packs in Phase D.backhandSquashFS crate — already used in greentic-setup.arc_swap— already used forTenantRuntimeswap; reuse for TrafficSplit.greentic-qa-lib::WizardDriver— kept in place; reused by all four existing wizards. No extraction in this plan.
9. Verification plan
For each phase, every check below passes on real hardware before that phase ships.
Phase 0 Verification
- Build a bundle whose answers include known secrets.
greentic-bundle doctor secrets built.gtbundlereports clean. - Archive inspection shows zero
.greentic/dev/.dev.secrets.envand plaintextsecret_values. - Malicious ZIP/SquashFS fixtures with absolute paths,
.., duplicate normalized paths, or root-escaping symlinks are rejected before extraction writes files. - Existing local setup and run flows still work through the compatibility config path.
Phase A Verification
gtc setupon a clean Linux box auto-creates~/.greentic/environments/local/environment.jsonwith the 5 default env-pack bindings.gtc start ./bundle.gtbundlebrings up a local env and deploys the bundle in one command. Re-running is idempotent.gtc op env-packs list localshows all 5 bindings bykind.gtc op env migrate-dev local --checkreports safe/unsafe with concrete file/key references.gtc op env doctor localreturns green.gtc op env destroy local --forcecleans up; re-runninggtc setupre-creates it.gtc op bundles add local ./bundle.gtbundle && gtc op revisions list localshows the revision in stateStaged.gtc op revisions warm local <ULID>advances state toReady.gtc op --schema env-packs addprints a valid QASpec;--answers prior.jsonruns non-interactively.- Crash injection during
set_bundle_state: file remains parseable JSON (atomic write). - 10,000-ULID property test: sorts lexicographically by creation timestamp; no collisions.
docker exec <running greentic-start> idreturnsuid=65532. Image < 30MB.- Missing
terraformon the host →gtc op env doctorexits 2 with install message. - Registering a new env-pack
kinddoes not touch any closed enum ingreentic-deploy-spec.
Phase B Verification
- Stage two revisions of one deployment,
gtc op traffic set local --deployment D <r1>=99 <r2>=1, send 1000 HTTP requests. 990±20 land on r1; 10±20 on r2 (chi-squared at p=0.95). - Deploy three bundles (
fast2flow,llm,RAG); each deployment gets its own TrafficSplit. - Deploy the same bundle twice for two different customers; both keep independent revision sequences and TrafficSplits.
- Header
X-Greentic-Deployment + X-Greentic-Revisionpins authenticated admin/test requests; ignored or rejected for unauthenticated public traffic. - Cookie-stickiness: same session_hint always lands on the same revision across 100 requests, scoped per deployment. Tampering with the signed value is ignored and re-pinned.
gtc op traffic rollback local --deployment Dreturns to the previous split atomically.gtc op revisions drain local <ULID>: ingress stops within 1s; existing requests complete withindrain_seconds; runtime tears down cleanly.- Two concurrent
gtc op traffic setinvocations: one wins (flock), the other gets "another operator holds the lock." - Tampered bundle (modified payload, modified sig, missing sidecar) fails
gtc op revisions stagewith a one-line DSSE-error explanation. BundleDeploymentcreated ongtc op bundles addwithcustomer_id(required for non-local), route binding, signed revenue policy.- Runtime telemetry carries
customer_id,deployment_id,bundle_id,revision_id, env-packkind, rollout generation. Metrics expose only documented low-cardinality subset. PersistedSetupState.secret_valuesplaintext field is gone from artifacts; CI grep gate passes.
Phase C Verification
gtc op credentials requirements localreturns green.gtc op credentials bootstrap stg-k8s --admin-profile zain-admin(kind cluster) produces a low-privilege SA token + rules-pack; admin token not in any persisted file; zeroization applied where possible.gtc op credentials rotate localre-validates and emits an audit event.- A reference app pack reads a non-secret URL via
RuntimeConfigReaderand a secret token viaSecretsManager; bundle contains no plaintext. - Mixed-version ABI tests pass: old components run through compatibility shim; new components can require
pack-config.v1. - A reference component reads
runtime://prod-eu/discovered/alb_dnsand resolves correctly. gtc op env-packs add local --kind greentic.telemetry.otlpruns that env-pack's QASpec via the operator's existing wizard driver. No new engine.- The four legacy wizards still work end-to-end (env-scoped) and write
secret_refs+non_secretcleanly.
Phase D — AWS slice Verification (first proving ground)
- Production EnvironmentStore backup/restore test passes before any real AWS traffic shift.
- RBAC denies an unauthorized
traffic set; audit records the policy decision. gtc op env create prod-eu+ env-packs add provisions ECS, ECR, ALB, IAM, S3 via deployer Terraform after credentials pass.gtc op credentials requirements prod-eureturns green ORbootstrapemits a rules pack.gtc op bundles add prod-eu support_v1.2.0.gtbundle --customer-id cust-acme --revenue-share agency-a:30%,greentic:70%creates aBundleDeployment.revisions warmcreates an ECS task-set; in-process dispatcher AND ALB weighted TG both see it.traffic set prod-eu --deployment <id> <new>=1 <old>=99: 1% of HTTP requests land on the new task set within 30s.- Telemetry export shows
customer_id=cust-acme,deployment_id=<ULID>,revenue_share=…on every invocation span/log. - Secret rotation in AWS-SM triggers graceful revision restart within 60s.
runtime.jsonpopulated withalb_dns,ecs_cluster_arn,task_set_ids.- E2E: stage → warm → 50/50 split → 100% on new → rollback → archive. Wall-clock < 15 minutes on a sandbox account.
Phase D — K8s slice Verification (Zain ship gate)
- Production EnvironmentStore backup/restore test passes before any real K8s traffic shift.
- RBAC denies an unauthorized
traffic set; audit records the policy decision. env create zain-prod+env-packs add zain-prod --slot deployer --kind greentic.deployer.k8s@1.0.0creates namespace, RBAC, router Deployment, per-revision worker Deployment template, Services, Ingress/Gateway, ESOClusterSecretStore, ServiceAccount with workload identity.env render zain-prod --output ./rendered/emits declarative manifests matching the applied resources.bundles add zain-prod customer.support_v1.2.0.gtbundle --customer-id cust-zain && revisions warm <ULID>creates a worker Deployment labeledgreentic.ai/revision=<ULID>; router runtime config includes it only after warm succeeds.traffic set zain-prod --deployment <id> <new>=1 <old>=99: 1% of requests land on the new revision within 30s. Stratified 5000-request test passes at p=0.95.- A second bundle
llm-router_v1.0.0.gtbundledeploys alongside without disturbing the first bundle's TrafficSplit. traffic rollback zain-prod --deployment <id>reverts atomically; old Deployment still running.- ESO projects AWS-SM (Zain's choice) values; rotating a secret propagates to running pods within 60s.
- HPA scales each revision's worker Deployment independently; router generation remains stable during scale events.
- Image runs as
uid=65532, non-root SecurityContext, no privilege escalation, read-only root FS, seccomp, resource requests/limits, digest-pinned image, NetworkPolicy, topology spread/anti-affinity, router PDB. - E2E on a real K8s cluster (kind / minikube / EKS sandbox): create env → bind env-packs → credentials pass → bundles add (two bundles, different customers) → warm → 50/50 split per deployment → 100% on new → rollback → drain old → archive. Wall-clock < 15 minutes.
10. Open questions / risks
- AWS sandbox account. Phase D nightly E2E needs a Greentic-owned AWS account or LocalStack. Same blocker as the prior plan; not resolved.
- K8s test cluster. The K8s slice needs a cluster for CI — kind/minikube for per-PR; a real EKS (or Zain-provided) cluster for nightly. Decision needed: long-running Greentic EKS sandbox, or ephemeral kind in GitHub Actions?
- Ingress / Gateway choice on K8s. NGINX Ingress, Traefik, Istio, or Gateway API. Must support at least NGINX Ingress and Gateway API; Istio opt-in.
- External Secrets Operator presence. K8s slice assumes ESO is installed. If not, either deployer env-pack installs it (cluster-admin required), or fallback that mounts secrets via a sidecar.
- K8s router availability. The corrected K8s model adds a stable router Deployment. Must be HA from day one: ≥2 replicas, PDB, readiness on loaded split generation, graceful config reload.
- Deployment route resolution.
deployment_idis now the rollout key. Router must deterministically resolve it from tenant/team/customer/host/path before revision selection. Ambiguous bindings must fail at deploy time, not at request time. - Production EnvironmentStore choice. Operator DB, Kubernetes CRDs, or object storage plus a lock service before AWS/K8s production acceptance.
- Existing Environment type split mechanics. 100+ call sites referencing the current monolithic type today. Phase A must land the split with adapter functions and CI to keep callers compiling.
- Aggregator / invoicer for P6 usage metering. This plan delivers the data model and the telemetry stamping; the aggregator is out of scope. Needs a separate
GREENTIC-BIZ/greentic-billingworkstream. - Customer_id assignment for
local. Defaulting tolocal-devis safe;gtc op bundles add prod-eu …MUST refuse to default. - Telemetry cardinality. Customer and deployment labels are useful for traces/logs/usage events but dangerous on every metric. Phase B must define metric views and budgets before exporters ship.
- Admin-bootstrap rules-pack acceptance. Customers' admins must trust the IaC rules pack we generate. Risk: generated Terraform/YAML doesn't match house style. Mitigation: templating override hook.
- Secrets trait integration. Provider crates exist, but runner/deployer integration still needs feature flags, auth material, sync/async adapters, policy decisions.
component-deployer-{hetzner,digitalocean,oracle}fate. Stubs only. Recommend deprecation in Phase D unless a customer asks.backhandcompression parity. Must cover whatevermksquashfsdefaults greentic-setup emits today (gzip / zstd).- Chainguard licensing. Default to distroless free image. Upgrade to Chainguard when subscription lands.
- Toolchain-manifest field. Phase D's image-digest lookup needs
runtime_image.greentic-start.digestinghcr.io/greenticai/greentic-versions/gtc:<channel>. Add as part of Phase D. set_bundle_stateis per-bundle, not per-pack. Per-pack lifecycle is Phase E-or-later; v1 ships bundle-as-revision.- Lifecycle enum drift. Current distributor enum lacks
failedandarchived. Phase A must extend or wrap. - WebSocket sessions across revisions. Cookie/header-pinned long-lived WebSocket connections must survive a
traffic rollback. v1 keeps connections on whichever revision they hit first;revision drainwaits the fulldrain_seconds. Mid-connection migration not supported. - Pack-config drift across revisions. Two revisions of the same bundle may have different
pack-config.v1answers. Dispatcher pins sessions to revisions, so a session sees consistent config. - Local env multi-bundle + multi-revision routing. Dispatcher runs in-process; multiple deployments × multiple revisions share the same port +
greentic-startprocess. Memory scales with active revisions. - Cloud-side LB sync drift. Phase D mirrors the in-process split to ALB/Cloud Run as a perf optimization. If they drift, the in-process dispatcher is authoritative; deployer env-pack logs a warning. Phase E adds drift detection.
- Env-pack discovery + publication. Greentic Store integration? Local-only registration via
gtc op env-packs install <oci-ref>? Phase A ships built-in registrations; Phase D needs publication semantics. - Migration of
dev→localfor live customers. No-dual-read is allowed only if the Phase A preflight proves zero production usage ofenv = "dev". Otherwise compatibility alias is mandatory.
11. Deferred (Trust + Wizard + Billing + Phase E)
These move out to separate workstreams so this plan stays scoped:
plans/greentic-trust-and-airgap.md (new, to be opened)
- KMS-backed signing keys (
kms://aws/<arn>,kms://gcp/<resource>,kms://azure/<vault>/<keyname>). - SLSA Provenance v1.2 predicate full content +
slsa-verifierintegration. - Rekor public-transparency-log integration (
cosign attest --rekor-url). - Revocation feed polling on
Environment.revocation.feed_url. - Drift detection (
terraform plan -detailed-exitcodefromgreentic-deployer doctor --drift). - DID/VC: tenant/team/env DIDs, six VC types, status-list snapshots.
.gtairgappackage format for offline / on-prem installs.- Reproducible builds (SLSA L3 hermetic + isolated).
plans/greentic-wizard-unification.md (new — carved out 2026-05-14)
- Extract a single
greentic-wizard-enginecrate from the four existing wizard runtimes. greentic.pack-wizard.v1spec — per-app-packwizard.yamlcontribution.- Composition: load all packs in target env's resolved pack-list; dedupe; resolve dependencies; run unified session.
- First-class Adaptive Card surface (CLI + AC renderers from one composed runtime).
- Shared-question mechanism for cross-pack dedupe.
- Deprecation + removal of the four legacy wizard runtimes.
GREENTIC-BIZ/greentic-billing (new workstream)
- Usage record aggregator — consumes telemetry stream with
(customer_id, deployment_id, bundle_id, revision_id). - Revenue-share apportionment against
BundleDeployment.revenue_share. - Invoicing + payout reconciliation.
- Rate limits / quotas / SLA enforcement keyed on
customer_id+bundle_id. - Storage layer (DB schema for usage records, monthly rollups, dispute trail).
Phase E (after Phase D ships AWS + K8s)
- Per-pack independent lifecycle inside a bundle (true pack-first granular updates).
- Cloud-side LB drift detection.
- Mid-connection WebSocket migration.
- Metric-driven progressive delivery (
abort_metricson Revision). - Provider-native K8s weighted routing as first-class option if Zain chooses Gateway API or Istio.
- GitOps round-trip for
Environment(declarativeenv apply -f env.yaml). - Additional multi-operator store backends.
- Env-pack hot-swap without bundle re-warm.
12. Critical anchors for execution
When implementation begins, these are the load-bearing pointers to read first:
greentic-runner/crates/greentic-runner-host/src/runtime.rs:37—ActivePacksandArcSwapare the atomic-swap primitive for the dispatcher. Phase B extends the key fromtenantto(tenant, deployment_id, bundle_id, revision_id).greentic-runner/crates/greentic-runner-host/src/host.rs:155—RunnerHost::load_pack(). Phase B addsload_revision(..., deployment_id, ...).greentic-config/crates/greentic-config-types/src/lib.rs:59—EnvironmentConfigto split: host stays here; setup + runtime move togreentic-deploy-spec(A1).greentic-types/src/store.rs— storeEnvironmentbecomes a read-only compose view.greentic-operator/src/static_routes.rs:65-73— the reserved/deployments/*prefixes B4 must implement into.greentic-operator/src/admin_api.rs:20-26—AdminStateis single-bundle today; B4 extends it.greentic-operator/src/wizard.rs— existing wizard driver that A10/C6 reuse for env-pack QASpec rendering (no extraction).greentic-pack/crates/greentic-pack/src/builder.rs:151-162—DistributionSection.environment_refbecomes the env-pack ↔ env binding hook (P2).greentic-distributor-client/src/dist.rs:470-477, :1902, :1960, :2030, :2124, :3315, :3956— state machine, lifecycle methods, non-atomic write, no-op verifier.greentic-setup/src/gtbundle.rs:85-99, :136-170, :394-427— the two archive paths C1 must both gate.greentic-bundle/src/bundle_fs/*— the bundle writer/reader paths C1 must also gate.greentic-bundle/src/setup/mod.rs:63-71—PersistedSetupState.secret_valuesplaintext field Phase 0 removes from artifacts and Phase B replaces withsecret_refs.greentic-setup/src/qa/persist.rs,greentic-operator/src/qa_persist.rs,greentic-start/src/qa_persist.rs— current all-config-through-secrets behavior that Phase C migrates.greentic-runner/crates/greentic-runner-host/src/engine/host.rs:28-54— session key shape that B6's deployment-scoped session-pin Redis hash augments. B11 also extends telemetry context.greentic/src/bin/gtc/{router.rs,cli.rs}anddeploy/start_stop.rs— top-level CLI dispatch and existinggtc start. A3 keepsoppassthrough and deletes 10+default_value("aws")args.greentic-qa/crates/qa-lib— crate namegreentic-qa-lib;WizardDrivergains env_id parameter (A10).
Appendix A — Implementation audit (anchor verification)
Source review on 2026-05-15 checked the plan's load-bearing assertions against the current workspace. The table records what is true today, before this plan is implemented.
| Anchor | Status | Finding |
|---|---|---|
| Single-active-bundle | Confirmed | AdminState holds a single bundle_root: PathBuf. static_routes.rs reserves /deployments/{stage,warm,activate,rollback,complete-drain} and the /deployments prefix, but no lifecycle handlers are wired there. |
| Pack-native runner | Confirmed | RunnerHost::load_pack(&self, tenant, pack_path) loads a .gtpack and inserts the runtime into ActivePacks. ActivePacks is ArcSwap<HashMap<String, Arc<TenantRuntime>>>, keyed by tenant only. |
| Distributor-client primitives | Confirmed | stage_bundle, warm_bundle, rollback_bundle, and set_bundle_state exist. write_bundle_record uses bare fs::write with no temp-file rename. Signature verification returns "signature verification is not implemented in the open-source client." |
| Environment types | Confirmed | EnvironmentConfig is a flat struct with env_id, deployment, connection, region. Store Environment is a registry entry with metadata, distributor ref, connection kind — not a deploy target with capability slots. |
| Pack manifest env hook | Minimally used | DistributionSection.environment_ref and desired_state_version exist and are validated for non-empty values. No downstream code binds them today. |
| Secret leakage | Confirmed | PersistedSetupState contains secret_values: BTreeMap<String, Value>. Setup bundle writers walk the source tree and need the planned allowlist and redaction gate. |
| Cloud coupling | Confirmed | gtc/cli.rs has 11 default_value("aws") occurrences. admin tunnel rejects non-AWS targets with "admin tunnel currently supports only --target aws". |
| Deployer crate size | Confirmed | greentic-deployer/src is 25,836 lines. The earlier ~4,126 LOC estimate refers to the cloud-specific subset, not the whole crate. |
| Wizard runtimes | Confirmed | Four homes exist. Shared crate path is greentic-qa/crates/qa-lib; package name is greentic-qa-lib. |
| Session key shape | Partially confirmed | SessionKey has tenant_key, pack_id, flow_id, session_hint. tenant_key embeds env as "{env}::{tenant}"; Phase B6 should preserve this nuance. |
| NATS subjects | Confirmed | Ingress subjects built under greentic.messaging.ingress.{env}.{tenant}.{team}.{platform}. No revision segment, matching the plan's decision. |
What the audit means for this plan
- The single-bundle operator state, unused environment hook, plaintext setup secret field, and no-op signature verifier are current and must be fixed by the planned phases.
- Two details need exact wording during implementation: the AWS default count is 11, and env is embedded inside
tenant_keyrather than stored as a separateSessionKeyfield. - The deployer extraction should scope against the cloud-specific implementation surface, not the whole
greentic-deployercrate.
— End of plan —
Rendered from plans/next-gen-deployment.md · 2026-05-15