Beyond Easy Wins: A Text Hardness-Aware Benchmark for LLM-generated Text Detection

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Existing AI text detectors are evaluated primarily using traditional metrics such as AUROC, overlooking critical real-world deployment requirements—including low false positive rates, cross-domain generalization, and adversarial robustness. Method: We propose SHIELD, a novel benchmark that unifies reliability (i.e., low false positive rate) and cross-domain stability into a single evaluation framework, and introduces an adversarial test suite with tunable hardness. Its core innovation is a model-agnostic, post-hoc humanization generation framework that jointly simulates linguistic patterns and performs style transfer to produce controllably human-like AI-generated text for stress testing. Contribution/Results: Experiments reveal significant performance degradation of mainstream zero-shot detectors on high-hardness adversarial samples. SHIELD effectively uncovers their practical vulnerabilities, shifting detector evaluation from isolated accuracy metrics toward utility-driven, deployment-relevant assessment.

Technology Category

Application Category

📝 Abstract

We present a novel evaluation paradigm for AI text detectors that prioritizes real-world and equitable assessment. Current approaches predominantly report conventional metrics like AUROC, overlooking that even modest false positive rates constitute a critical impediment to practical deployment of detection systems. Furthermore, real-world deployment necessitates predetermined threshold configuration, making detector stability (i.e. the maintenance of consistent performance across diverse domains and adversarial scenarios), a critical factor. These aspects have been largely ignored in previous research and benchmarks. Our benchmark, SHIELD, addresses these limitations by integrating both reliability and stability factors into a unified evaluation metric designed for practical assessment. Furthermore, we develop a post-hoc, model-agnostic humanification framework that modifies AI text to more closely resemble human authorship, incorporating a controllable hardness parameter. This hardness-aware approach effectively challenges current SOTA zero-shot detection methods in maintaining both reliability and stability. (Data and code: https://github.com/navid-aub/SHIELD-Benchmark)

Problem

Research questions and friction points this paper is trying to address.

Evaluates AI text detectors for real-world reliability and stability

Addresses overlooked false positive rates in detection systems

Introduces hardness-aware benchmark to challenge SOTA detection methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel evaluation paradigm for AI text detectors

SHIELD benchmark integrates reliability and stability

Model-agnostic humanification framework with hardness parameter

🔎 Similar Papers

No similar papers found.