Skip to content

AI Adoption

Principles for introducing AI coding tools to an engineering team without mandating them, leaving anyone behind, or undermining code ownership.

AI tools lower the cost of generating code. That does not mean every engineer must use them. Offer access, document how to get started, and stop there. Code how you want.

No workflow, review process, or sprint metric should reward AI-generated code over hand-written code. The goal is good software, not AI usage.

ActionNot this
Publish setup docs in the wikiRequire completion of an AI onboarding course
Fund licenses for anyone who asksGate access behind manager approval
Share tips in a Slack channelAdd “AI skills” to the promotion rubric
Let people opt out silentlyTrack who uses AI and who does not

The distinction matters: access removes barriers; mandates create pressure. Pressure distorts workflows — engineers optimize for the metric instead of the outcome.

AI tools change how fast you can generate code. They do not change what you own.

ResponsibilityWhy It Stays HumanExample
Architecture trade-offsOrg history, team capacity, and roadmap context are not in any context windowChoosing a message queue means weighing ops burden against team skill
Mentorship in reviewPR threads are where juniors learn; the war story behind that odd regex is yours to tellExplaining why you chose eventual consistency over strong consistency
The merge decision”The AI wrote it” is not a defense when it pages you at 2 a.m. — merge code you can evaluateA generated migration that silently drops a column default
Incident responseTriage requires system history, customer context, and judgment under pressureDeciding to revert vs. forward-fix at 3 a.m. during a payment outage
Architecture reviewTrade-offs between teams, timelines, and organizational constraints live outside the codebaseChoosing boring technology when the team has six months of runway
Hiring decisionsTechnical interviews assess communication, collaboration, and growth — not just correct outputEvaluating whether a candidate reasons through ambiguity
Ethical judgment”Should we build this?” requires values, not optimizationDeclining to ship a feature that exploits user psychology for retention
Values”Should we build this at all?” is a question no model can answer for your teamDeciding whether to collect location data you technically could use

Start at the level that matches your curiosity. There is no expected progression.

LevelModeWhat it meansConcrete examples
0CompletionsAccept, reject, or ignore inline suggestions as you typeCopilot in VS Code, Supermaven, Codeium — tab-complete while typing
1DraftingAsk for a first draft; read it before you use itChatGPT for a regex, Claude for a shell script, inline chat in IDE
2DelegationHand off a bounded, well-specified task; verify the outputClaude Code for a refactor, Cursor agent mode, Codex for test gen

Level 0 is a legitimate long-term choice. Some engineers find completions useful and never want more. That is fine. Adoption levels are not a maturity model.

  • Level 0: Treat suggestions as spell-check, not autopilot. Reject more than you accept. The value is speed on patterns you already know.
  • Level 1: Read the draft as if a junior wrote it. Check error handling, edge cases, and naming. The first draft saves time; the review is where quality lives.
  • Level 2: Write the spec before delegating. A vague prompt produces vague code. Specify inputs, outputs, error behavior, and constraints. Verify the result against the spec, not against your intuition.

AI tools lower the cost of opening PRs. That cost transfers to reviewers — especially codeowners with high surface area.

Before opening a large AI-assisted PR against a system you do not own:

  • Ask whether codeowners have a preferred approach or prior art
  • Review the diff yourself first and cut what is not needed — do not hand reviewers 800 lines of generated code to triage
  • Do not mistake model confidence for correctness

Codeowners have standing to ask for rewrites. That is the deal.

AI changes the economics of code review. Generation becomes cheap; review stays expensive. This imbalance creates predictable failure modes.

Before AI: Write 100 lines/hr → Review 100 lines/hr → balanced
After AI: Write 500 lines/hr → Review 100 lines/hr → bottleneck at review

Review throughput does not scale with generation throughput. Every unreviewed line is latent risk.

HeuristicRationale
Smaller PRs, even if AI can go bigReview quality degrades past ~400 lines
Read the diff, not the promptThe prompt describes intent; the diff describes reality
Check error paths firstModels optimize for the happy path; errors get generic handling
Verify names match domain languageGenerated identifiers drift from team conventions
Run the code, not just read itPlausible-looking code passes visual review but fails at runtime

DORA 2025 measured AI adoption across thousands of engineering organizations and found a two-directional result: AI correlates positively with throughput and negatively with stability. Teams using AI ship faster and break more things — simultaneously.

This is not a simple good/bad finding. It reveals where the bottleneck moved. Before AI, the constraint was generation speed: how fast can we write code? After AI, the constraint is validation speed: how fast can we verify code?

AI accelerates the easy part of software development (producing code) without accelerating the hard part (confirming it works in production). The result depends on the validation floor:

FoundationAI outcome
Strong CI, observability, clear specsSpeed converts to delivery
Weak CI, sparse testsSpeed converts to more deployments that break
No staging, manual QASpeed converts to debt and incident load

AI does not make teams better or worse. It amplifies existing process quality.

Team with strong validation:
More code → caught by CI → stable delivery → throughput win
Team with weak validation:
More code → bypasses gaps → unstable delivery → DORA stability hit

The practical implication: invest in the validation floor before expanding generation capacity. A team that adds AI tooling before fixing flaky tests and missing observability will ship its existing problems faster.

LayerMinimum bar
TestsCI runs on every PR; failures block merge
Type checkingCompiler or type checker catches shape errors early
ObservabilityErrors, latency, and saturation visible in dashboards
StagingChanges deploy to a non-production environment first
RollbackAny deploy can revert within minutes
OwnershipEvery service has a named on-call rotation

Rules of thumb for when AI helps and when it hinders.

SituationWhy it works
Boilerplate with known patternsModels excel at repeating well-documented structures
First draft of testsGenerates coverage scaffolding; you refine assertions
Language or API you rarely useFaster than reading docs for a one-off task
Regex, jq filters, shell one-linersSyntax-dense tools where the model recalls patterns you forgot
Explaining unfamiliar codeSummarizes intent faster than reading a 500-line file cold
Migrating between formatsCSV-to-JSON, YAML restructuring, config format changes
SituationWhy AI hinders
Security-sensitive logicModels hallucinate safe-looking code that fails edge cases silently
Performance-critical hot pathsGenerated code optimizes for readability, not throughput
Novel algorithmsModels recombine training data; genuinely new logic needs thought
Architectural decisionsContext window holds code, not org politics and team dynamics
Code you cannot readIf you cannot evaluate the output, you cannot own it
Debugging production incidentsTriage needs system history and real-time signals, not generation

Before merging AI-generated code, answer three questions:

  1. Can I explain what this does? — If not, do not merge it.
  2. Can I debug this at 2 a.m.? — If not, rewrite until you can.
  3. Would I write a test for this? — If the answer is “the AI should write the test too,” you have delegated judgment, not just labor.
TrapWhat it looks likeCounter
Mandatory AI usageSprint retros track “AI-assisted PRs” as a KPI; devs pad the metricMeasure outcomes (cycle time, defect rate), not tool adoption
Skipping reviewReviewer skims a 400-line generated diff, approves in 2 minutesTreat generated code with more scrutiny — it has no author to ask
Ignoring codeownersDev opens 6 large PRs against services they don’t own in one sprintBatch changes; ask codeowners for preferred approach before starting
Measuring AI usageDashboard shows ”% of code written by AI” as if that number should go upTrack delivery metrics; AI usage is an input, not an outcome
Assuming faster = betterMETR RCT: experienced devs took 19% longer with AI but believed they were 20% fasterTime actual tasks; subjective speed perception is unreliable
Automation biasEngineer accepts a generated answer without checking because “it usually works”Require the same review bar for generated and handwritten code
Context window worshipTeam stuffs every doc into the prompt, assumes the model read and understood all of itModels degrade on long context; provide focused, relevant input only
DeskillingJunior devs generate code they cannot explain, skipping the learning curveRequire juniors to write core logic by hand first, then compare

The METR finding deserves emphasis. In a randomized controlled trial on real open-source tasks, experienced developers using AI completed tasks 19% slower than without AI — yet self-reported feeling 20% faster. The gap between perceived and actual speed was nearly 40 percentage points.

This happens because AI changes the texture of work. Generating code feels productive. Reading, verifying, and debugging generated code feels like overhead. Engineers undercount the verification time because it does not feel like “real work.” The result: teams believe AI accelerated them when it did not, and make resourcing decisions based on the illusion.

Signals that AI adoption is going wrong:

  • “We should require AI tools to hit our velocity targets”
  • Incident rate climbs but deployment frequency also climbs
  • PR review times increase while PR sizes grow
  • Junior engineers ship code they cannot explain in review
  • The team debates prompt engineering more than system design
  • Codeowners become bottlenecks because PR volume doubled
  • Generated tests pass but do not catch real bugs
  • “The AI wrote it” appears in incident postmortems

One or two of these in isolation mean little. Three or more together indicate the validation floor needs investment before generation capacity grows further.

  • AI CLI — Claude Code usage, context files, prompting
  • Agent Orchestration — Multi-agent patterns and failure modes
  • Complexity — Essential vs accidental complexity; applies directly to AI-generated code
  • Testing — The validation floor that determines whether AI speed converts to delivery or debt
  • Agentic Workflows — Lesson plan for working with AI agents in engineering