Authorship Attribution in Multilingual Machine-Generated Texts

📅 2025-08-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the fine-grained author attribution problem for machine-generated text in multilingual settings—specifically, distinguishing human-authored content from outputs of eight mainstream large language models (LLMs) across 18 languages. To overcome the limitation of existing methods, which operate predominantly in monolingual contexts, we propose a cross-lingual evaluation framework that systematically assesses the transferability of diverse monolingual attribution methods across languages and language families. Experimental results reveal that while certain monolingual methods exhibit modest cross-lingual generalization, their performance degrades sharply under substantial typological divergence. Moreover, LLM-generated stylistic patterns demonstrate pronounced language-specificity and architecture-dependence, critically undermining attribution robustness. The work identifies core challenges in multilingual attribution—including linguistic heterogeneity, model-induced stylistic biases, and cross-family generalization failure—and provides both theoretical insights and empirical evidence essential for developing deployable, robust multilingual AI-content detection systems.

Technology Category

Application Category

📝 Abstract
As Large Language Models (LLMs) have reached human-like fluency and coherence, distinguishing machine-generated text (MGT) from human-written content becomes increasingly difficult. While early efforts in MGT detection have focused on binary classification, the growing landscape and diversity of LLMs require a more fine-grained yet challenging authorship attribution (AA), i.e., being able to identify the precise generator (LLM or human) behind a text. However, AA remains nowadays confined to a monolingual setting, with English being the most investigated one, overlooking the multilingual nature and usage of modern LLMs. In this work, we introduce the problem of Multilingual Authorship Attribution, which involves attributing texts to human or multiple LLM generators across diverse languages. Focusing on 18 languages -- covering multiple families and writing scripts -- and 8 generators (7 LLMs and the human-authored class), we investigate the multilingual suitability of monolingual AA methods, their cross-lingual transferability, and the impact of generators on attribution performance. Our results reveal that while certain monolingual AA methods can be adapted to multilingual settings, significant limitations and challenges remain, particularly in transferring across diverse language families, underscoring the complexity of multilingual AA and the need for more robust approaches to better match real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Distinguish machine-generated from human-written multilingual texts
Extend authorship attribution beyond monolingual English settings
Assess cross-lingual transferability of attribution methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual authorship attribution across diverse languages
Investigating monolingual AA methods' cross-lingual transferability
Analyzing generator impact on multilingual attribution performance
🔎 Similar Papers
No similar papers found.
L
Lucio La Cava
DIMES Department, University of Calabria, Italy
Dominik Macko
Dominik Macko
Kempelen Institute of Intelligent Technologies
machine-generated text detectionlarge language modelsInternet of Thingsnetwork security
R
Róbert Móro
Kempelen Institute of Intelligent Technologies, Slovakia
Ivan Srba
Ivan Srba
Kempelen Institute of Intelligent Technologies
AIMachine LearningNatural Language ProcessingSocial ComputingDisinformation
A
Andrea Tagarelli
DIMES Department, University of Calabria, Italy