Trends and Challenges in Authorship Analysis: A Review of ML, DL, and LLM Approaches

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This survey systematically reviews authorship analysis (AA) research from 2015 to 2024, focusing on the core tasks of author attribution and author verification. To address these, we synthesize methodological advances across feature engineering (e.g., n-grams, stylometric features), classical machine learning (e.g., SVM, Random Forest), deep learning (e.g., RNNs, Transformers), and large language models (via fine-tuning and prompt engineering), organizing insights into a four-dimensional framework: *method–feature–dataset–challenge*. Our contribution is threefold: first, we provide the first comprehensive taxonomy of multi-paradigm AA approaches, clarifying their evolutionary trajectories and applicability boundaries; second, we explicitly identify critical research gaps—including low-resource language processing, multilingual adaptability, cross-domain generalization, and detection of AI-generated text; third, we offer a principled theoretical foundation and actionable guidelines for developing robust, multilingual, and interpretable AA systems.

Technology Category

Application Category

📝 Abstract

Authorship analysis plays an important role in diverse domains, including forensic linguistics, academia, cybersecurity, and digital content authentication. This paper presents a systematic literature review on two key sub-tasks of authorship analysis; Author Attribution and Author Verification. The review explores SOTA methodologies, ranging from traditional ML approaches to DL models and LLMs, highlighting their evolution, strengths, and limitations, based on studies conducted from 2015 to 2024. Key contributions include a comprehensive analysis of methods, techniques, their corresponding feature extraction techniques, datasets used, and emerging challenges in authorship analysis. The study highlights critical research gaps, particularly in low-resource language processing, multilingual adaptation, cross-domain generalization, and AI-generated text detection. This review aims to help researchers by giving an overview of the latest trends and challenges in authorship analysis. It also points out possible areas for future study. The goal is to support the development of better, more reliable, and accurate authorship analysis system in diverse textual domain.

Problem

Research questions and friction points this paper is trying to address.

Reviewing ML, DL, LLM methods for authorship attribution and verification

Analyzing challenges in low-resource languages and AI-generated text detection

Identifying research gaps for improving accuracy in authorship analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic review of ML, DL, LLM methods

Analysis of feature extraction and datasets

Focus on low-resource language challenges

🔎 Similar Papers

Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges

2024-08-16arXiv.orgCitations: 6

Whose LLM is it Anyway? Linguistic Comparison and LLM Attribution for GPT-3.5, GPT-4 and Bard

2024-02-22arXiv.orgCitations: 10

Authors to Follow