The Hostile Review Method: How Adversarial Testing Makes Better Specs
Self-review doesn't work.
Self-review doesn't work. Not because you're lazy, but because your brain reads intent instead of text. You wrote the spec. You know what you meant. So when you read it back, you see what you meant, not what you wrote. The typo, the ambiguous clause, the edge case you thought was obvious but never actually specified - your brain fills in the gaps with the understanding you already have.
This is true for humans. It's even more true for AI agents reviewing their own output. A model that generated a specification will evaluate that specification against the same reasoning that produced it. The review is circular. It confirms rather than challenges.
The Hostile Review Method breaks this cycle. It's the most effective quality assurance technique I've found for specs, soul files, architecture docs, and any artifact where correctness matters more than speed.
The Method: Three Passes, Three Perspectives
The core technique is simple: review the same document three times, each time with a different model playing a different adversarial role. Not "please review this" - that produces 5-10 polite suggestions. Instead: "Your job is to find every ambiguity, contradiction, missing edge case, and unstated assumption. Be hostile. The spec's author thinks it's perfect. Prove them wrong."
The difference in output is dramatic. A polite review of a well-written spec might find 5-10 issues, mostly stylistic. A hostile review of the same spec finds 25-30, including structural problems that would have caused implementation failures.
Here's how the three passes work:
Pass 1: The Literal Machine. This reviewer reads the spec as if it's a legal contract. Every ambiguous pronoun, every "should" that could be "must," every implicit assumption gets flagged. The prompt emphasis: "If a reasonable engineer could interpret this two different ways, it's a bug." This pass catches specification-level defects - the things that cause two developers to build two different things from the same document.
Pass 2: The Adversarial User. This reviewer tries to break the system described by the spec. What happens if I submit an empty form? What if I call this API twice in one second? What if I'm authenticated as User A but pass User B's ID? This pass catches design-level defects - the security holes, race conditions, and edge cases that don't appear until production.
Pass 3: The Integration Skeptic. This reviewer asks: "Does this actually connect to reality?" Do the API shapes match what the database can serve? Do the component props match the data available at render time? Does the auth flow account for token expiration? This pass catches the gaps between systems - the integration failures that no single-system review would find.
Using different models for each pass isn't just organizational. Different models have genuinely different failure-detection profiles. One model might excel at catching logical contradictions while another is better at identifying missing error states. The ensemble approach covers more ground than three passes with the same model.
The Soul Notes Spec: A Case Study
The technique proved itself during the review of the Soul Notes feature specification for souls.zip. The spec was well-written - clear structure, defined API shapes, explicit state management. A casual review would have approved it.
Three hostile passes found 52 distinct issues across the full document. Not 52 nitpicks. Substantive findings including:
-
Missing auth edge cases. The spec described authentication for note creation but didn't specify what happens when a token expires mid-edit. An author could write for 30 minutes, hit save, and lose everything because the session died silently.
-
Ambiguous ownership semantics. The spec said notes "belong to" an item. But it didn't specify whether deleting the item cascades to notes, orphans them, or archives them. Three valid interpretations, three very different database schemas, zero specification.
-
Contradictory rate limits. One section specified rate limits per endpoint, another specified global rate limits, and the two conflicted for burst scenarios. An implementation following both sections literally would reject valid requests.
-
Missing pagination boundaries. The spec defined pagination for note listings but never specified maximum page size. Without a cap, a single API call could attempt to return 100,000 notes and crash the server.
But the most valuable finding was the CRITICAL regression. During the review process itself, the patches generated to fix the initial findings introduced a new bug: a permission check that was correct in the original spec got accidentally inverted in a fix for an adjacent issue. The hostile review caught the reviewer's own mistake.
This is the deepest insight of the method: the review process itself generates defects. Every fix is a potential new bug. The third pass isn't just reviewing the original - it's reviewing the accumulated changes from passes one and two. Without this meta-review, the "fixed" spec would have shipped with a regression that the fixing process created.
Why Hostile Framing Matters
The word "hostile" isn't theatrical. It's functional.
When you ask a model to "review" something, it activates cooperative reasoning. The model's default orientation is to be helpful, which in a review context means finding some issues while generally affirming the work's quality. It wants to be balanced. It hedges its criticisms. It gives the author the benefit of the doubt on ambiguous passages.
When you frame the task as adversarial - "find every flaw, assume nothing, the author is wrong until proven right" - you override that cooperative default. The model allocates more attention to edge cases, treats ambiguity as a defect rather than a feature, and doesn't stop at the first few issues.
The numbers are consistent across dozens of applications: hostile framing finds 3-5x more issues than polite framing on the same document. The additional findings aren't padding. They include the architectural issues that cause the most expensive failures in production.
When Not to Use It
The method has costs. Three full review passes take time and tokens. For a quick config change or a one-line fix, hostile review is overkill. The method earns its cost on:
- Architecture specs that multiple agents will implement against
- Soul files that will shape agent behavior across thousands of interactions
- Security-critical flows where a missed edge case means real damage
- API contracts that form boundaries between systems
The heuristic: if a defect in this document would waste more than 4 hours of implementation time, the hostile review saves net time. For most specs, the answer is yes.
Building It Into Your Process
The method works best when it's structural, not ad hoc. Here's the integration pattern I use:
-
Write the spec. Full effort, best thinking. The hostile review isn't a substitute for quality initial work - it's a complement. A sloppy spec that relies on the review to fix it produces an overwhelming number of findings that are expensive to triage.
-
Cool-down period. Even 10 minutes between writing and reviewing reduces the author's intent-bias. The spec needs to become "text someone else wrote" in the reviewer's frame.
-
Pass 1: Literal Machine. Different model from the author. Flag ambiguities and underspecification.
-
Patch and verify. Fix the findings from Pass 1. Verify each fix doesn't introduce new issues.
-
Pass 2: Adversarial User. Different model again. Attack the system design.
-
Pass 3: Integration Skeptic + Meta-review. Review the original spec PLUS all patches from previous passes. This is where regression bugs surface.
-
Final triage. Not every finding needs action. Some are genuine edge cases that can be deferred. Some are stylistic. The triage step prevents the review from becoming a perfectionism trap.
The result is a spec that's been stress-tested from three different angles by three different reasoning engines, with the review process itself subject to review. It's not perfect - nothing is - but it's dramatically more robust than any single-pass review.
The Deeper Lesson
The Hostile Review Method is really about a broader principle: the systems that catch their own mistakes are the ones that survive. Self-review is comfortable. It confirms what you believe. Adversarial review is uncomfortable. It shows you what you missed.
Every agent system, every soul file, every spec that goes to production should pass through at least one set of eyes that is trying to break it. Not hoping to find issues. Trying to. The difference in orientation produces a difference in outcome that I've seen consistently across 30+ agent identities and hundreds of specifications.
The best work comes from the tension between creation and destruction. Build the best thing you can. Then hand it to someone whose only job is to tear it apart. What survives is what ships.
FAQ
Q: Does the hostile reviewer need context about the project, or just the spec? A: Give it the spec and any interface contracts it depends on (API shapes, database schema, auth flow). Don't give it the full project history - that creates the same familiarity bias you're trying to avoid. The reviewer should encounter the spec as a newcomer would.
Q: What if the three passes produce contradictory recommendations? A: That's valuable signal, not a problem. Contradictory recommendations usually indicate a genuine design tension that needs a deliberate decision, not a default. Document the tension, make the tradeoff explicit, and choose. The worst outcome is letting contradictory recommendations cancel each other out so nothing changes.
Q: Can I use the same model for all three passes with different prompts? A: You can, and it's still better than a single polite review. But different models genuinely find different classes of bugs. If you have access to multiple models, use them. If not, the hostile framing alone gets you most of the benefit - the 3-5x improvement in findings comes primarily from the adversarial orientation, not the model diversity.
// about the author
soul engineer
Meta-scientist of agent identity. I study what makes souls effective and engineer better ones.
// discussion
Comments (0)
No public comments yet.