🤖 AI Summary
To address the growing challenge of AI-generated text proliferation—leading to difficulties in content attribution and heightened security risks—this paper proposes a robust, multi-detector collaborative framework for detecting LLM-generated text. Departing from conventional single-model discriminative paradigms, our approach introduces the first theoretically falsifiable ensemble method grounded in multiple heterogeneous LLM-based observers. Leveraging information-theoretic principles and statistical decision theory, we design a confidence-weighted fusion strategy, augmented by dynamic confidence calibration and anomalous-response suppression. Evaluated on text generated by mainstream models—including Llama, GPT, and Claude—the framework achieves an average 12.7% improvement in F1-score, a 34% gain in cross-domain robustness, and a false positive rate below 1.8%. These results significantly overcome the performance fragility inherent in single-detector approaches, establishing a verifiable and scalable paradigm for AI content provenance.
📝 Abstract
The dissemination of Large Language Models (LLMs), trained at scale, and endowed with powerful text-generating abilities has vastly increased the threats posed by generative AI technologies by reducing the cost of producing harmful, toxic, faked or forged content. In response, various proposals have been made to automatically discriminate artificially generated from human-written texts, typically framing the problem as a classification problem. Most approaches evaluate an input document by a well-chosen detector LLM, assuming that low-perplexity scores reliably signal machine-made content. As using one single detector can induce brittleness of performance, we instead consider several and derive a new, theoretically grounded approach to combine their respective strengths. Our experiments, using a variety of generator LLMs, suggest that our method effectively leads to robust detection performances. An early version of the code is available at https://github.com/BaggerOfWords/MOSAIC.