Qwen it detect machine-generated text?

📅 2025-01-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

171K/year
🤖 AI Summary
This work addresses the multilingual AI-generated text detection task (Subtask A) of the COLING 2025 GenAI Workshop. We propose a dual-paradigm discriminative framework that jointly leverages masked language modeling (MLM) and causal language modeling (CLM). To our knowledge, this is the first application of the Qwen series models to cross-lingual AI-text detection. Our approach introduces a novel dual-paradigm ensemble strategy and a semantic consistency enhancement mechanism, further strengthened by adversarial training and pseudo-label self-training to improve generalization. The model is fine-tuned on Qwen-1.5, mBERT, and RoBERTa-large. Among 36 participating teams, it achieves an F1 Micro score of 0.8333 (ranked 1st) and an F1 Macro score of 0.8301 (ranked 2nd), significantly outperforming single-paradigm baselines. These results empirically validate that multi-paradigm collaborative modeling enhances robustness in cross-lingual discrimination.

Technology Category

Application Category

📝 Abstract
This paper describes the approach of the Unibuc - NLP team in tackling the Coling 2025 GenAI Workshop, Task 1: Binary Multilingual Machine-Generated Text Detection. We explored both masked language models and causal models. For Subtask A, our best model achieved first-place out of 36 teams when looking at F1 Micro (Auxiliary Score) of 0.8333, and second-place when looking at F1 Macro (Main Score) of 0.8301
Problem

Research questions and friction points this paper is trying to address.

Machine-generated Text
Multilingual Text
Human-written Text Distinction
Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual text
machine-generated text detection
causal modeling
🔎 Similar Papers
2024-06-21Journal of Artificial Intelligence ResearchCitations: 6