🤖 AI Summary
This work addresses the multi-granularity and multi-scenario challenges in machine-generated text detection—specifically, document-level binary/multi-class classification (including generator attribution), sentence-level mixed-text segmentation, and robust detection under adversarial attacks. We propose a unified cross-granularity detection framework that, for the first time, integrates generator attribution, adversarial robustness, and fine-grained segmentation within a single paradigm. To support comprehensive evaluation, we introduce BMAS-English, a benchmark dataset enabling multi-task assessment. Our approach synergistically combines deep classification models, sequence labeling, and adversarial sample generation via multi-task learning. Experiments demonstrate significant improvements over state-of-the-art methods in both generator attribution and adversarial sample identification. The framework delivers a more holistic, robust, and scalable technical pathway for AIGC detection.
📝 Abstract
Large Language Models (LLMs) are gearing up to surpass human creativity. The veracity of the statement needs careful consideration. In recent developments, critical questions arise regarding the authenticity of human work and the preservation of their creativity and innovative abilities. This paper investigates such issues. This paper addresses machine-generated text detection across several scenarios, including document-level binary and multiclass classification or generator attribution, sentence-level segmentation to differentiate between human-AI collaborative text, and adversarial attacks aimed at reducing the detectability of machine-generated text. We introduce a new work called BMAS English: an English language dataset for binary classification of human and machine text, for multiclass classification, which not only identifies machine-generated text but can also try to determine its generator, and Adversarial attack addressing where it is a common act for the mitigation of detection, and Sentence-level segmentation, for predicting the boundaries between human and machine-generated text. We believe that this paper will address previous work in Machine-Generated Text Detection (MGTD) in a more meaningful way.