🤖 AI Summary
This survey systematically reviews authorship analysis (AA) research from 2015 to 2024, focusing on the core tasks of author attribution and author verification. To address these, we synthesize methodological advances across feature engineering (e.g., n-grams, stylometric features), classical machine learning (e.g., SVM, Random Forest), deep learning (e.g., RNNs, Transformers), and large language models (via fine-tuning and prompt engineering), organizing insights into a four-dimensional framework: *method–feature–dataset–challenge*. Our contribution is threefold: first, we provide the first comprehensive taxonomy of multi-paradigm AA approaches, clarifying their evolutionary trajectories and applicability boundaries; second, we explicitly identify critical research gaps—including low-resource language processing, multilingual adaptability, cross-domain generalization, and detection of AI-generated text; third, we offer a principled theoretical foundation and actionable guidelines for developing robust, multilingual, and interpretable AA systems.
📝 Abstract
Authorship analysis plays an important role in diverse domains, including forensic linguistics, academia, cybersecurity, and digital content authentication. This paper presents a systematic literature review on two key sub-tasks of authorship analysis; Author Attribution and Author Verification. The review explores SOTA methodologies, ranging from traditional ML approaches to DL models and LLMs, highlighting their evolution, strengths, and limitations, based on studies conducted from 2015 to 2024. Key contributions include a comprehensive analysis of methods, techniques, their corresponding feature extraction techniques, datasets used, and emerging challenges in authorship analysis. The study highlights critical research gaps, particularly in low-resource language processing, multilingual adaptation, cross-domain generalization, and AI-generated text detection. This review aims to help researchers by giving an overview of the latest trends and challenges in authorship analysis. It also points out possible areas for future study. The goal is to support the development of better, more reliable, and accurate authorship analysis system in diverse textual domain.