π€ AI Summary
This study addresses the growing challenge of detecting increasingly realistic machine-generated text, which fuels misinformation and exposes the limited generalization of current detectors due to training data constraints. To this end, the authors propose MAGA, a framework that enhances detector robustness and generalization by generating more challenging texts through full-process alignment. The core innovation lies in a reinforcement learning mechanism based on detector feedback (RLDF), which systematically optimizes the generation process to effectively attack and thereby strengthen detectors. Experimental results demonstrate that fine-tuning RoBERTa detectors on the MAGA dataset improves average generalization AUC by 4.60%, while MAGA-generated texts reduce the average AUC of existing detectors by 8.13%, substantially validating the frameworkβs effectiveness.
π Abstract
Large Language Models (LLMs) alignment is constantly evolving. Machine-Generated Text (MGT) is becoming increasingly difficult to distinguish from Human-Written Text (HWT). This has exacerbated abuse issues such as fake news and online fraud. Fine-tuned detectors'generalization ability is highly dependent on dataset quality, and simply expanding the sources of MGT is insufficient. Further augment of generation process is required. According to HC-Var's theory, enhancing the alignment of generated text can not only facilitate attacks on existing detectors to test their robustness, but also help improve the generalization ability of detectors fine-tuned on it. Therefore, we propose \textbf{M}achine-\textbf{A}ugment-\textbf{G}enerated Text via \textbf{A}lignment (MAGA). MAGA's pipeline achieves comprehensive alignment from prompt construction to reasoning process, among which \textbf{R}einforced \textbf{L}earning from \textbf{D}etectors \textbf{F}eedback (RLDF), systematically proposed by us, serves as a key component. In our experiments, the RoBERTa detector fine-tuned on MAGA training set achieved an average improvement of 4.60\% in generalization detection AUC. MAGA Dataset caused an average decrease of 8.13\% in the AUC of the selected detectors, expecting to provide indicative significance for future research on the generalization detection ability of detectors.