🤖 AI Summary
Existing AI text detectors are evaluated primarily using traditional metrics such as AUROC, overlooking critical real-world deployment requirements—including low false positive rates, cross-domain generalization, and adversarial robustness. Method: We propose SHIELD, a novel benchmark that unifies reliability (i.e., low false positive rate) and cross-domain stability into a single evaluation framework, and introduces an adversarial test suite with tunable hardness. Its core innovation is a model-agnostic, post-hoc humanization generation framework that jointly simulates linguistic patterns and performs style transfer to produce controllably human-like AI-generated text for stress testing. Contribution/Results: Experiments reveal significant performance degradation of mainstream zero-shot detectors on high-hardness adversarial samples. SHIELD effectively uncovers their practical vulnerabilities, shifting detector evaluation from isolated accuracy metrics toward utility-driven, deployment-relevant assessment.
📝 Abstract
We present a novel evaluation paradigm for AI text detectors that prioritizes real-world and equitable assessment. Current approaches predominantly report conventional metrics like AUROC, overlooking that even modest false positive rates constitute a critical impediment to practical deployment of detection systems. Furthermore, real-world deployment necessitates predetermined threshold configuration, making detector stability (i.e. the maintenance of consistent performance across diverse domains and adversarial scenarios), a critical factor. These aspects have been largely ignored in previous research and benchmarks. Our benchmark, SHIELD, addresses these limitations by integrating both reliability and stability factors into a unified evaluation metric designed for practical assessment. Furthermore, we develop a post-hoc, model-agnostic humanification framework that modifies AI text to more closely resemble human authorship, incorporating a controllable hardness parameter. This hardness-aware approach effectively challenges current SOTA zero-shot detection methods in maintaining both reliability and stability. (Data and code: https://github.com/navid-aub/SHIELD-Benchmark)