mdok of KInIT: Robustly Fine-tuned LLM for Binary and Multiclass AI-Generated Text Detection

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

AI-generated text detection suffers from insufficient robustness under out-of-distribution (OOD) conditions and is vulnerable to adversarial misuse. Method: We propose MDOK—a knowledge-injection-based lightweight fine-tuning approach built upon the KInIT framework, integrating distillation adaptation for small-scale LLMs (e.g., Phi-3, TinyLlama), a multi-task classification head, and adversarial training. MDOK unifies binary and fine-grained multi-class detection—including human-AI collaborative modes. Contribution/Results: MDOK achieves first place in the Voight-Kampff 2025 Challenge’s multi-class track and sets a new SOTA in binary classification accuracy. It improves cross-lingual and cross-collaboration-mode F1 scores by 12.7%, significantly enhancing OOD generalization. Its core innovation lies in a low-overhead knowledge injection mechanism that boosts the robustness of compact models, establishing a novel paradigm for trustworthy AI content provenance.

Technology Category

Application Category

📝 Abstract

The large language models (LLMs) are able to generate high-quality texts in multiple languages. Such texts are often not recognizable by humans as generated, and therefore present a potential of LLMs for misuse (e.g., plagiarism, spams, disinformation spreading). An automated detection is able to assist humans to indicate the machine-generated texts; however, its robustness to out-of-distribution data is still challenging. This notebook describes our mdok approach in robust detection, based on fine-tuning smaller LLMs for text classification. It is applied to both subtasks of Voight-Kampff Generative AI Detection 2025, providing remarkable performance in binary detection as well as in multiclass (1st rank) classification of various cases of human-AI collaboration.

Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated texts to prevent misuse like plagiarism

Improving robustness of detection for out-of-distribution data

Classifying human-AI collaboration cases in binary and multiclass tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned smaller LLMs for text classification

Robust detection of AI-generated texts

Applied to binary and multiclass classification tasks

🔎 Similar Papers

No similar papers found.

Authors to Follow