17. LLM Spec-Driven Development Pipeline (on the App Mesh workflow-engine)

A reference pipeline that turns a plain requirement into a reviewed, spec-backed, tested change in a git repo — orchestrated as an App Mesh workflow (DAG of steps). It is the worked example behind the design discussion of “which spec-dev framework fits the engine.”

Two files:

File	What it is	Status
`spec-pipeline-demo.yaml`	Runnable demo — shell-only, no external tools; same DAG as production	✅ runs green on a live daemon (verified, both paths)
`llm-spec-pipeline.yaml`	Production template — same DAG, real `claude`/`openspec`/build/test commands	✅ parses against the engine; commands are placeholders to fill in

17.1. The three layers

This pipeline deliberately separates concerns. The engine is only the orchestrator; specs are the contract; the agent discipline lives inside a step.

Layer	Responsibility	What plays this role
Orchestration	sequence stages, fan-out, gates, retries, per-tenant isolation, audit	App Mesh workflow-engine (this DAG)
Artifact / spec	durable, diffable, git-committed source-of-truth between stages	OpenSpec change proposals (`specs/changes/<id>/`)
Agent behavior	how a single coding/review step thinks (TDD, review roles)	`claude`/`codex` inside a step (+ superpowers / gstack skills)

Why this split: the engine is language-agnostic and runs each step as an isolated process, so the thing that must flow between steps has to be a durable artifact (an OpenSpec change committed to the repo), not in-agent state. gstack/superpowers are Claude-Code-internal methodologies — they belong inside the implement/review steps, not as the backbone.

17.2. The DAG

checkout
  └─ clarify ── spec ──┬─ review_eng  ─┐
                       ├─ review_design ┤   (parallel fan-out)
                       └─ review_devex ─┘
                            ├─ rework_spec        (if: failure()  — any reviewer REJECT)
                            └─ plan ── implement ── test ── code_review ── ship ── finalize (if: always())

Job	Purpose	Notable engine feature
`checkout`	clone repo into a per-run workspace	run-scoped shared dir (`$RUNDIR/<run_id>`)
`clarify`	normalize the requirement into a testable brief	—
`spec`	create the OpenSpec change proposal (the artifact)	`retry` until it validates; change-id emitted on stdout
`review_eng` / `review_design` / `review_devex`	specialist review gates	parallel fan-out; a REJECT = non-zero exit = job failure
`rework_spec`	revise spec if any review rejected	`if: "failure()"`
`plan`	implementation plan from the approved spec	runs only if all reviews passed (deps-failed ⇒ auto-skip)
`implement`	agent codes against the spec, then a build/acceptance gate	`retry` = bounded "until the gate passes" loop
`test`	the repo's own test suite	plain command step
`code_review`	pre-landing review gate	—
`ship`	`openspec archive` → commit → push/PR	—
`finalize`	status summary, always	`if: "always()"` + step-level `finally`

17.2.1. Cross-step data flow

Artifacts (requirements.md, the OpenSpec change, plan.md) live on a shared, run-scoped directory so later jobs read what earlier jobs wrote.
Small values (the change-id, review verdicts, job status) flow via expressions: ${{ jobs.spec.steps.propose.stdout }}, ${{ jobs.review_eng.status }}, ${{ workflow.run_id }}, ${{ inputs.feature }}.

17.3. Run it

The engine is driven through the run_task Task API (the appm CLI and SDKs wrap this). Every call carries the caller’s JWT; the engine authenticates it, enforces per-workflow ownership, and runs the steps as the caller (recorded as actor).

17.3.1. Via the CLI

appm workflow add  -f src/workflow/docs/spec-pipeline-demo.yaml
appm workflow run  spec-pipeline-demo                       # green path
appm workflow run  spec-pipeline-demo -i demo_reject=true   # exercise the rework path
appm workflow runs spec-pipeline-demo                       # list runs
appm workflow logs spec-pipeline-demo <run_id>              # flow log

17.3.2. Via an SDK (Python)

import json
from appmesh import AppMeshClient
c = AppMeshClient(base_url="https://127.0.0.1:6060", ssl_verify=False); c.login("admin", "***")
tok = c._get_access_token()
def call(action, **kw):
    return json.loads(c.run_task("workflow", json.dumps({"action": action, "token": tok, **kw}), 90))

call("workflow_add", workflow="spec-pipeline-demo", content=open("src/workflow/docs/spec-pipeline-demo.yaml").read())
rid = call("run", workflow="spec-pipeline-demo", inputs={})["data"]["run_id"]
print(call("run_detail", workflow="spec-pipeline-demo", run_id=rid)["data"]["status"])

17.3.3. Verified behavior

Default (all reviews approve):

FINAL: success   (actor=admin)
  rework_spec   skipped         # failure-path not taken
  implement     success         # step log shows "attempt 2"  -> retry fired
  ... all other jobs success

demo_reject=true (eng rejects):

FINAL: failure
  review_eng    failure
  plan/implement/test/code_review/ship   skipped   # dependency-failure gating
  rework_spec   success          # if: failure()
  finalize      success          # if: always()

17.4. From demo to production

Swap the shell bodies in llm-spec-pipeline.yaml for your real commands: claude -p/codex for the agent steps, your real openspec CLI flags, and your repo’s build/test scripts. (The demo proves the orchestration; production just changes the command bodies.)
Secrets: put ANTHROPIC_API_KEY etc. on the workflow App’s sec_env (encrypted at rest; surfaced to steps as env vars). Never inline keys in the YAML.
Tenant permissions: a workflow runs steps as the triggering caller, so that user needs the permissions the engine uses per step: app-run-task, app-run-async, app-run-sync, app-subscribe, app-output-view, app-delete (plus label-view if you use node selectors). Missing app-subscribe is the classic “every command step fails to start” symptom.
Ownership/roles: the registrant owns the workflow; only the owner or a workflow admin (APPMESH_WORKFLOW_ADMINS) may run/manage it. Each run is isolated and audited (actor).

17.5. Caveats (by design)

Acyclic DAG — there is no literal review↔fix loop. Bounded iteration is retry (single step) or re-trigger the workflow (rerun); a true multi-round loop must run inside a step’s agent.
Long runs vs token validity — steps run with the caller’s token; a run that outlives the token will fail closed mid-flight. Trigger with a token whose lifetime covers the run, and use a non-renewing (one-shot) token so a session refresh doesn’t revoke it mid-run.
Automatic (cron/event) triggers run under the engine identity, not a caller — keep those pipelines to engine-appropriate work, or wait for the act-as capability (ADR 0004 / Phase 3).
Placeholder commands (claude/openspec/gh/./scripts/*) must exist in the daemon’s environment; otherwise those steps fail at the command (a useful orchestration smoke test).

17.6. See also

docs/adr/0006-workflow-multi-tenant-authz.md — ownership, caller-scoped execution, audit.
docs/adr/0002, 0004, 0005 — workflow storage, run model, the run_task transport.