AI Code Looks Right. That’s the Problem.
Teams that treat AI as a code printer will drown in slop and debt. Those who become the best at specifying what they want and verifying that they got it will get the promised 10x productivity boost.

AI coding tools are now embedded into the day-to-day workflows of most software engineers. Teams are merging more PRs, shipping faster, and velocity charts look great.
But something feels off. New code is created faster than humans can review it. Engineers talk openly about shipping code they don’t understand. Some say that we’re entering the era of ‘write-only’ code.
In that era, we have to think really hard about how not to enter the era of ‘AI slop’ code: code that compiles, passes basic checks, and looks plausible but is subtly wrong, bloated, or misaligned with what was actually needed
Taxonomy of AI Code Slop
Plausible but wrong: Correct syntax, wrong logic. It looks like it works until edge cases bite you.
Over-engineered: The AI builds an abstraction layer for a problem that needed 10 lines.
Convention-blind: Ignores your repo’s patterns, naming, or architecture. Code is generic good code, not the code that fits your system.
Confidently hallucinates calls to APIs that don’t exist, uses deprecated methods, or invents config options. These are sometimes caught, but not always, especially when wrapped in an otherwise legitimate structure.
Defensive programming: Too many “try-catch” blocks, absorbing errors silently, and adding too many logs.
Cargo-cult code: Copies patterns without understanding why, like retry logic where it’s not needed, error handling that swallows everything
The common thread is that slop is not obvious. It passes the eye test; it looks like real code.
That’s what makes it dangerous.
Code Review Was Never Designed for This
Traditional code review assumes a human author with intent. Even when the diff is large, the reviewer can ask the author about their approach, and they can explain the reasoning or the constraints.
AI-generated PRs are so large, fast, and frequent that reviewers can’t keep up. More importantly, reviewers are checking “Does this look right?” when they should be checking “Does this do what we intended?”
But that intent usually isn’t in the PR. The prompt that generated the code isn’t in the PR. The spec isn’t written down. The reviewer is guessing.
Code review was built to check craftsmanship and correctness within a shared context. It wasn’t built to recover missing intent.
The Missing Artifact: Intent
When a human writes code, intent travels with them. During review, they explain tradeoffs, rejected alternatives, and constraints. Even if these details are not written down, they are accessible.
When AI writes code, intent may exist in a prompt, a ticket, a Slack thread, or only in someone’s head. The implementation is preserved; the reasoning is not.
Without shared intent, reviewing AI-generated code is just pattern matching against vibes and the root cause of AI slop code getting through: we have no formal way to verify what the code was supposed to do.
Intent-Driven Development as a Guardrail
The idea of formalizing intent before implementation is not new. Behavior-Driven Development, Test-Driven Development, and Design-by-Contract approaches all tried to define behavior in structured, human-readable specs before writing code.
These approaches were often perceived as overhead. Writing formal behavior descriptions and contracts demanded discipline and time. Under delivery pressure, teams frequently skipped them. Now, AI makes them more practical, not less.
AI can help generate structured acceptance criteria, behavioral specs, or even contract-like descriptions. It can also help enforce them.
The workflow shift is straightforward: agree on what the system must do in a precise, reviewable form. Then let AI handle how it does it. Finally, verify the output against the agreed intent.
In this model, the spec becomes the primary artifact under review. The code is validated against it. Review shifts from “Does this look okay?” to “Does this satisfy the contract we approved?”
We have to review not just that “it works”, but also carefully check (1) that it matched all criteria in the spec and also (2) that it didn’t add anything more than the required spec. These two things collectively form a strong guardrail to catch the slop.
Practical Guardrails to Reduce AI Code Slop
Scope the AI tightly: Small, well-defined tasks with clear acceptance criteria produce dramatically less slop than “build this feature”.
Capture intent as a first-class artifact: Whether it’s a BDD spec, a contract, a structured ticket, or something else, write down the what before generating the how.
Review intent, not just implementation: Approve the spec/contract first, then verify the code against it. Catching if AI went beyond what was in the spec should also be part of this review.
Automate what you can: Tests, linting, and type checks catch surface-level slop. But acknowledge they don’t catch intent drift.
Treat AI output like unverified output: It’s not junior dev-level code; it’s unreviewed contractor code. You wouldn’t merge it without checking it against requirements.
Build a team slop list: Patterns and anti-patterns specific to your codebase that AI consistently gets wrong. Feed these back into prompts or CI checks.
The New Bottleneck Isn’t Writing Code
Code generation is essentially solved. The bottleneck has moved to code review. The hard problems now are capturing intent, verifying alignment, maintaining coherence across a growing codebase.
Teams that treat AI as a code printer will drown in slop and debt. Those who become the best at specifying what they want and verifying that they got it will get the promised 10x productivity boost.









