🤖 AI Summary
This work addresses the subtle yet critical fine-grained flaws embedded in peer reviews generated by large language models (LLMs), which often appear fluent but contain hard-to-detect inaccuracies. To tackle this challenge, we propose TADDLE, the first tool-augmented agent framework specifically designed for detecting LLM-generated review defects. TADDLE integrates four specialized analytical tools—Verify, Correct, Complete, and Transform—and employs a two-stage semi-supervised learning mechanism to enable both binary classification and multi-label identification of defects. Our contributions include the construction of the first expert-annotated, multi-label benchmark dataset comprising 1,800 reviews across six defect categories, along with the public release of our codebase. Experimental results demonstrate that TADDLE significantly outperforms baseline methods on both detection tasks.
📝 Abstract
LLM-generated peer reviews are increasingly common at major venues, yet their deficiencies are hard to detect because they are uniformly fluent and well-structured. Existing work either classifies authorship without judging quality, or scores quality with features designed for human-written reviews; no prior system detects deficiencies in LLM-generated reviews at the level of individual defect types. To bridge the gap, we introduce TADDLE, a Tool-Augmented Agent for Detecting Deficient LLM-Generated Peer Reviews, together with the first expert-annotated benchmark for this task. Our benchmark comprises 1,800 reviews on 50 ICLR 2025 papers, multi-label-annotated by 18 domain experts against a taxonomy of six defect categories (plus a non-deficient label). TADDLE decomposes detection into four specialized analysis tools -- Verify, Correct, Complete, and Transform -- orchestrated by an agent; an integrator synthesizes their outputs into binary and multi-label classifications via two-stage semi-supervised learning. Extensive experiments show that TADDLE performs strongly on both binary detection and the multi-label classification task. We release the benchmark and code at https://github.com/AquariusAQ/TADDLE.