Skip to content

Specification

The bottleneck shifted from execution to specification.

AI writes code fast. The constraint is no longer “can we build it?” but “can we define what to build?” Specification — the act of stating precisely what a system must do — is the skill that compounds in an era of automated execution.

“If you’re thinking without writing, you only think you’re thinking.” — Leslie Lamport

Every specification sits on a spectrum from informal to formal. Match the level to the risk.

LevelFormPrecisionToolingWhen to use
InformalUser stories, proseLowJira, docsEarly exploration, business alignment
StructuredBDD/Gherkin, decision tablesMedium-lowCucumber, spreadsheetsAcceptance criteria, stakeholder comms
SchemaOpenAPI, JSON Schema, ProtobufMediumValidators, code generatorsAPI boundaries, service contracts
Type systemTypeScript strict, Rust typesMedium-highCompilerStructural invariants, data shapes
ContractsDbC (pre/post/invariant)Highdeal, contracts crateInterface guarantees, runtime enforcement
Property-basedHypothesis, QuickCheckHighTest frameworksBehavioral invariants, edge case discovery
Formal modelTLA+, AlloyVery highModel checkersConcurrency, distributed systems

Most teams live at the schema and type system levels. The leverage is at the ends: structured specifications catch ambiguity early; formal models catch concurrency bugs that testing misses entirely.

The simplest formal method. Write all combinations of conditions and their outcomes in a grid. Missing rows reveal missing specifications.

| Premium? | Cart > $100? | Holiday? | Discount |
|----------|-------------|----------|----------|
| Y | Y | Y | 25% |
| Y | Y | N | 15% |
| Y | N | Y | 10% |
| Y | N | N | 5% |
| N | Y | Y | 10% |
| N | Y | N | 0% |
| N | N | Y | 5% |
| N | N | N | 0% |

Eight rows for three booleans. No ambiguity. No “it depends.” The table is the specification — hand it to any implementation and get the same result.

Bertrand Meyer’s framework: every function is a contract between supplier and client.

ElementWho guarantees itMeaning
PreconditionClient (caller)“I promise the input looks like…”
PostconditionSupplier (callee)“I promise the output will…”
InvariantBoth”This property always holds”
# Python with the `deal` library
import deal
@deal.pre(lambda amount: amount > 0, message="amount must be positive")
@deal.post(lambda result: result >= 0, message="balance cannot go negative")
def withdraw(balance: float, amount: float) -> float:
return balance - amount

The contract documents and enforces the specification. When deal integrates with Hypothesis, it generates tests automatically from contract decorators — contracts become executable specifications.

The key insight: Contracts separate what from how. The precondition says “give me positive numbers.” The postcondition says “I’ll return a non-negative balance.” The implementation is free to change as long as the contract holds.

Example-based tests prove specific cases. Property-based tests prove invariants across thousands of generated inputs.

Example-based: sort([3,1,2]) == [1,2,3] ← one case
Property-based: for any list L, len(sort(L)) == len(L) ← all cases
for any list L, sort(L) is non-decreasing
for any list L, elements(sort(L)) == elements(L)

Properties define what a function must do without specifying how. The test framework generates inputs — including edge cases humans overlook: empty lists, single elements, duplicates, very large inputs.

ToolLanguageKey feature
HypothesisPythonShrinks failing cases to minimal
QuickCheckHaskell (original)Ported to most languages
icontract-hypothesisPythonGenerates tests from DbC decorators

TLA+ and Alloy are not academic exercises. AWS started with seven teams using TLA+ in 2015. By 2024, formal methods — TLA+, P, Dafny, Kani — had spread across teams building S3, DynamoDB, EBS, Aurora, EC2, and more. Engineers learn TLA+ from scratch and get useful results in 2–3 weeks. AWS also adopted the P language as a more programmer-friendly alternative for engineers who found TLA+‘s mathematical notation steep.

ToolCreatorBest forEntry point
TLA+Leslie LamportConcurrency, distributed systems, state machinesPlusCal (sequential DSL)
PMicrosoft/AWSDistributed systems, state machinesImperative syntax
AlloyDaniel JacksonData models, structural constraints, small scopeAlloy 6 (adds temporal)

Why testing is insufficient for concurrency

Section titled “Why testing is insufficient for concurrency”

A concurrent system with n threads and m states has m^n possible interleavings. Tests exercise a handful. A model checker explores all of them.

Hillel Wayne modeled Tom Cargill’s BoundedBuffer challenge in TLA+ and found the deadlock bug in 31 minutes. Don Wells solved the same challenge with unit tests in 16 hours. The bug existed in a notify() interleaving that tests rarely hit.

---- MODULE TransferMoney ----
VARIABLES alice_balance, bob_balance
Init ==
/\ alice_balance = 100
/\ bob_balance = 50
Transfer(amount) ==
/\ alice_balance >= amount
/\ alice_balance' = alice_balance - amount
/\ bob_balance' = bob_balance + amount
Invariant ==
alice_balance + bob_balance = 150
====

The model checker verifies the invariant holds across every reachable state. If it fails, you get a counterexample trace — the exact sequence of steps that breaks it.

CLAUDE.md files, agent prompts, and constraint lists are specifications. They define what an agent must do, what it must not do, and what success looks like.

Specification elementTraditional softwareAgent systems
Input contractType signaturesContext boundaries
Output contractReturn typesOutput format schemas
InvariantsDatabase constraintsBehavioral constraints
Resource boundsMemory/CPU limitsToken budgets, timeouts
Success criteriaAssertionsEvaluation rubrics

The gap between traditional and agent specification: traditional contracts are deterministic (the compiler enforces them), agent contracts are probabilistic (the agent might violate them). This makes specification more important, not less — clear constraints reduce the probability of violation.

  • Multiple developers interpret the same requirement differently
  • Bugs appear in interleavings or edge cases testing misses
  • The cost of a production bug exceeds the cost of specification
  • You’re defining boundaries between autonomous systems (services, agents)
  • Prototyping where requirements will change tomorrow
  • Solo projects where you are both specifier and implementer
  • Throwaway scripts with a lifespan of hours

Ask three people to implement the same specification independently. If they produce meaningfully different implementations, the specification is insufficient.

“There are two ways of constructing software: One way is to make it so simple that there are obviously no deficiencies. The other way is to make it so complicated that there are no obvious deficiencies.” — C.A.R. Hoare

“The hardest single part of building a software system is deciding precisely what to build.” — Fred Brooks

“Specifications are thinking tools, not proof systems.” — Hillel Wayne

“Prompt engineering is tactical execution; specification is strategic intent.”

  • Testing — Property-based testing as executable specification
  • Complexity — Specification defines abstraction boundaries
  • API Design — Schemas as specification for service boundaries
  • Orchestration — Agent contracts and constraint specification
  • Thinking — Specification as a thinking discipline
  • Specification Lesson Plan — Eight lessons from decision tables to TLA+
  • Leslie Lamport — Specifying Systems (2002, free PDF)
  • Hillel Wayne — Practical TLA+ (2018)
  • Daniel Jackson — The Essence of Software (2021)
  • Bertrand Meyer — Object-Oriented Software Construction (1997)
  • Hillel Wayne — learntla.com (free tutorial)
  • AWS — How Amazon Web Services Uses Formal Methods (CACM 2015)
  • AWS — Systems Correctness Practices at AWS (CACM 2024)