The Spec Layer
Why Spec-Driven Development (SDD) Works
An AI agent implements a feature. The code compiles. The tests pass. It still misses the point.
The wrong kind of correct.
Most of our software tooling is optimized for the failures humans used to make. Agents fail differently.
They usually don’t break the build. They disable the failing test. They reuse the nearest pattern. They preserve the old path and add a new one beside it. Everything looks reasonable until the codebase starts filling with locally valid mistakes.
The failure modes are familiar:
I just disabled the failing tests.
I just reused the existing service.
I did not change the existing behavior.
You’re right. I assumed that...
When a decision isn’t written down, the agent has to decide it again. Context windows are finite and even imperfect within. The deeper issue is too much freedom at execution time.
Compilers, linters, and tests help. They catch syntax errors, broken imports, and failing behavior. They are worse at telling you whether the agent made the right call. Even a large test catalog is weak against additive change.
Code generation improved faster than the systems that constrain it. The problem is underconstrained execution: too much freedom at the point where the agent has to act. Written intent is one way to constrain that freedom. Specs are one layer that can provide it. The historical case for that layer is clearest in protocols.
Protocol engineering is the cleanest historical evidence. Not because protocols capture every rejected alternative, but because they define interfaces that many implementations can target. RFC 791 standardized Internet Protocol in 1981. HTTP semantics live in RFC 9110. TLS 1.3 lives in RFC 8446. HTML is maintained as a living standard by WHATWG. In each case, the spec lets many implementations evolve over time.
But specs do not remove the hard part. Dijkstra’s narrow-interfaces critique shows that precision work does not disappear when you move from code to prose. Lamport and TLA+ show why explicit invariants still matter before implementation. Model-driven development shows the risk of pushing the abstraction too far and turning the spec into the thing you have to edit.
So the goal is to reduce execution freedom.
Spec-driven development means writing durable intent down before implementation, then using it to plan, build, check, and revise the work.
The word spec is a bit overloaded. Separate what the system must do from how this codebase will do it, the task list, and the rules that should survive later changes.
Each one narrows a different choice. Specs constrain intent. Plans constrain approach. Tasks constrain sequencing. Tests, schemas, and lint constrain behavior. Harnesses constrain execution.
The real disagreement is where to put the constraint. GitHub Spec Kit and Kiro keep them near the change workflow: requirements, design, and tasks for one piece of work. OpenSpec moves them into the repo as a decision record that survives the change.
Tessl pushes further and asks whether the spec itself should become the thing you edit, which is where the Dijkstra objection lands hardest: “a sufficiently detailed spec is code.” Intent treats the spec as shared state. Symphony treats it as an orchestration contract for autonomous runs.
Each one tries to pin the agent down at a different point.
Underneath the product differences, they keep rebuilding the same skeleton: durable context, feature intent, a technical plan, explicit tasks, and verification. The goal is to give the agent less room to improvise.
So what would the ideal model look like today? Smaller than most current tools imply, with a cleaner handoff between intent and execution.
The spec should be declarative, so the agent matches the code to the intent instead of replaying a brittle patch script. It should be layered, so product requirements do not quietly turn into architecture and technical plans do not quietly add product scope. And it has to be cheap to revise. If a spec is expensive to update, replace, or delete, the process hardens into ceremony and the ceremony becomes the work.
Where a rule can be enforced mechanically, move it out of the spec and into lint, schemas, tests, or the harness. Use less prose. Enforce more. Specs matter, but they are only one layer. Full SDD should stay optional for small bug fixes, fast prototypes, and exploratory UX.
The winning model puts a narrow interface between human intent and machine execution: intent narrows the search space. Code, tests, and harnesses govern behavior. Smaller specs, harder checks, less guessing.
