HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detection

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the challenge of detecting large language model (LLM)-generated or multi-step machine-revised text in black-box settings, where conventional detectors struggle due to stylistic obfuscation. We propose Human Language Preference Optimization (HLPO), a detection framework that explicitly models human linguistic preferences via a reward-based mechanism to enhance sensitivity to natural writing styles. HLPO further incorporates a five-dimensional prompt generator to simulate diverse adversarial revisions. Evaluated on GPT-series and state-of-the-art LLM-revised texts, HLPO achieves an AUROC 15.11% higher than ImBD and 45.56% higher than Fast-DetectGPT, attaining the best average performance. Its core contribution lies in the explicit integration of human language preferences into the detection architecture, significantly improving robustness against sophisticated, stealthy machine revisions—particularly those involving iterative or multi-task editing.

Technology Category

Application Category

📝 Abstract

To prevent misinformation and social issues arising from trustworthy-looking content generated by LLMs, it is crucial to develop efficient and reliable methods for identifying the source of texts. Previous approaches have demonstrated exceptional performance in detecting texts fully generated by LLMs. However, these methods struggle when confronting more advanced LLM output or text with adversarial multi-task machine revision, especially in the black-box setting, where the generating model is unknown. To address this challenge, grounded in the hypothesis that human writing possesses distinctive stylistic patterns, we propose Human Language Preference Detection (HLPD). HLPD employs a reward-based alignment process, Human Language Preference Optimization (HLPO), to shift the scoring model's token distribution toward human-like writing, making the model more sensitive to human writing, therefore enhancing the identification of machine-revised text. We test HLPD in an adversarial multi-task evaluation framework that leverages a five-dimensional prompt generator and multiple advanced LLMs to create diverse revision scenarios. When detecting texts revised by GPT-series models, HLPD achieves a 15.11% relative improvement in AUROC over ImBD, surpassing Fast-DetectGPT by 45.56%. When evaluated on texts generated by advanced LLMs, HLPD achieves the highest average AUROC, exceeding ImBD by 5.53% and Fast-DetectGPT by 34.14%. Code will be made available at https://github.com/dfq2021/HLPD.

Problem

Research questions and friction points this paper is trying to address.

Detecting machine-revised texts in black-box settings

Improving identification of human versus machine writing styles

Addressing adversarial multi-task revisions by advanced LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward-based alignment shifts token distribution to human patterns

Human Language Preference Optimization enhances sensitivity to human writing

Adversarial multi-task framework tests detection with diverse revision scenarios

🔎 Similar Papers

Learning to Rewrite: Generalized LLM-Generated Text Detection