A model reviewing its own work brings its own blind spots to the inspection.
When one model writes the code and the same model reviews it, you're asking the same mind to catch its own mistake — and it shares its own blind spots. I learned that self-review feels like review but isn't. The confidence is real; the coverage isn't.
I tested a cross-provider review — pipe the same diff to models from different families (a Google model, a coding-tuned model) and let them argue. A small runner sends the change out, collects the verdicts, and flags anything two of them independently distrust.
Cross-family review. A different lineage catches what a shared lineage rationalizes. The bug that 'passed' got named by a reviewer that hadn't written it.
Trusting a green test run plus one model's thumbs-up. Same author, same blind spot — the inspection was theater.
The reviewer that didn't write the code is the only one whose 'looks good' is worth anything.