🤖 AI Summary
This work addresses the multilingual AI-generated text detection task (Subtask A) of the COLING 2025 GenAI Workshop. We propose a dual-paradigm discriminative framework that jointly leverages masked language modeling (MLM) and causal language modeling (CLM). To our knowledge, this is the first application of the Qwen series models to cross-lingual AI-text detection. Our approach introduces a novel dual-paradigm ensemble strategy and a semantic consistency enhancement mechanism, further strengthened by adversarial training and pseudo-label self-training to improve generalization. The model is fine-tuned on Qwen-1.5, mBERT, and RoBERTa-large. Among 36 participating teams, it achieves an F1 Micro score of 0.8333 (ranked 1st) and an F1 Macro score of 0.8301 (ranked 2nd), significantly outperforming single-paradigm baselines. These results empirically validate that multi-paradigm collaborative modeling enhances robustness in cross-lingual discrimination.
📝 Abstract
This paper describes the approach of the Unibuc - NLP team in tackling the Coling 2025 GenAI Workshop, Task 1: Binary Multilingual Machine-Generated Text Detection. We explored both masked language models and causal models. For Subtask A, our best model achieved first-place out of 36 teams when looking at F1 Micro (Auxiliary Score) of 0.8333, and second-place when looking at F1 Macro (Main Score) of 0.8301