LLM Encoder vs. Decoder: Robust Detection of Chinese AI-Generated Text with LoRA

📅 2025-08-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited robustness of Chinese AI-generated text detection caused by linguistic subtleties and distributional shifts across domains, this paper systematically evaluates encoder architectures (Chinese BERT-large, RoBERTa-wwm-ext-large), a decoder (Qwen2.5-7B), and FastText under cross-domain settings. We propose a lightweight, LoRA-based fine-tuning framework featuring instruction-guided input, prompt-driven masked language modeling for pre-adaptation, and an efficient classification head optimized end-to-end. This approach significantly enhances the generalization capability and domain adaptability of large language models for Chinese AIGC detection. Experimental results demonstrate that the LoRA-finetuned Qwen2.5-7B achieves 95.94% test accuracy—attaining the optimal trade-off between precision and recall—and substantially outperforms baseline models. The findings validate both the superiority of decoder-centric architectures and the efficacy of parameter-efficient adaptation strategies for Chinese AIGC detection.

Technology Category

Application Category

📝 Abstract
The rapid growth of large language models (LLMs) has heightened the demand for accurate detection of AI-generated text, particularly in languages like Chinese, where subtle linguistic nuances pose significant challenges to current methods. In this study, we conduct a systematic comparison of encoder-based Transformers (Chinese BERT-large and RoBERTa-wwm-ext-large), a decoder-only LLM (Alibaba's Qwen2.5-7B/DeepSeek-R1-Distill-Qwen-7B fine-tuned via Low-Rank Adaptation, LoRA), and a FastText baseline using the publicly available dataset from the NLPCC 2025 Chinese AI-Generated Text Detection Task. Encoder models were fine-tuned using a novel prompt-based masked language modeling approach, while Qwen2.5-7B was adapted for classification with an instruction-format input and a lightweight classification head trained via LoRA. Experiments reveal that although encoder models nearly memorize training data, they suffer significant performance degradation under distribution shifts (RoBERTa: 76.3% test accuracy; BERT: 79.3%). FastText demonstrates surprising lexical robustness (83.5% accuracy) yet lacks deeper semantic understanding. In contrast, the LoRA-adapted Qwen2.5-7B achieves 95.94% test accuracy with balanced precision-recall metrics, indicating superior generalization and resilience to dataset-specific artifacts. These findings underscore the efficacy of decoder-based LLMs with parameter-efficient fine-tuning for robust Chinese AI-generated text detection. Future work will explore next-generation Qwen3 models, distilled variants, and ensemble strategies to enhance cross-domain robustness further.
Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated Chinese text with linguistic nuances
Comparing encoder and decoder models for robustness
Addressing performance degradation under distribution shifts
Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA-adapted decoder LLM for Chinese text detection
Prompt-based masked fine-tuning for encoder models
Instruction-format classification with lightweight head
🔎 Similar Papers
No similar papers found.