🤖 AI Summary
Existing AI-text detectors struggle to distinguish AI-polished text—human-written drafts lightly edited by LLMs—from fully AI-generated content, leading to high false positives and misattribution. Method: We systematically expose fundamental limitations of current detectors in fine-grained AI involvement identification, and introduce APT-Eval, the first benchmark covering multiple AI-polishing intensity levels (11.7K samples). We evaluate 11 state-of-the-art detectors across robustness, granularity resolution, and model bias. Results: All detectors exhibit high false positive rates on lightly polished texts, fail to differentiate “AI-generated” from “AI-polished,” and strongly bias toward misclassifying edits made by older or smaller LLMs. Our findings advocate redefining AI-text detection as an AI-involvement attribution task—shifting from binary classification to fine-grained, intensity-aware attribution—to support more reliable academic integrity assessment and AI usage analytics.
📝 Abstract
The growing use of large language models (LLMs) for text generation has led to widespread concerns about AI-generated content detection. However, an overlooked challenge is AI-polished text, where human-written content undergoes subtle refinements using AI tools. This raises a critical question: should minimally polished text be classified as AI-generated? Misclassification can lead to false plagiarism accusations and misleading claims about AI prevalence in online content. In this study, we systematically evaluate eleven state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation (APT-Eval) dataset, which contains $11.7K$ samples refined at varying AI-involvement levels. Our findings reveal that detectors frequently misclassify even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models. These limitations highlight the urgent need for more nuanced detection methodologies.