Stylometry Analysis of Human and Machine Text for Academic Integrity

📅 2026-01-03
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a unified natural language processing framework to address key challenges in academic integrity, including plagiarism, content fabrication, and authorship verification. The framework integrates four core stylometric tasks: classification of human- versus machine-generated text, distinction between single- and multi-author documents, detection of authorship changes within multi-author texts, and identification of contributing authors in collaborative writing. The study introduces and publicly releases the first academic text dataset generated using Gemini under two distinct instruction settings—standard and strict—and systematically evaluates how prompting strategies affect detection performance. Experimental results demonstrate that texts produced under strict instructions are significantly more adversarial, thereby increasing the difficulty of accurate identification. The code and dataset are made openly available, establishing a new benchmark for research on academic integrity.

Technology Category

Application Category

📝 Abstract
This work addresses critical challenges to academic integrity, including plagiarism, fabrication, and verification of authorship of educational content, by proposing a Natural Language Processing (NLP)-based framework for authenticating students'content through author attribution and style change detection. Despite some initial efforts, several aspects of the topic are yet to be explored. In contrast to existing solutions, the paper provides a comprehensive analysis of the topic by targeting four relevant tasks, including (i) classification of human and machine text, (ii) differentiating in single and multi-authored documents, (iii) author change detection within multi-authored documents, and (iv) author recognition in collaboratively produced documents. The solutions proposed for the tasks are evaluated on two datasets generated with Gemini using two different prompts, including a normal and a strict set of instructions. During experiments, some reduction in the performance of the proposed solutions is observed on the dataset generated through the strict prompt, demonstrating the complexities involved in detecting machine-generated text with cleverly crafted prompts. The generated datasets, code, and other relevant materials are made publicly available on GitHub, which are expected to provide a baseline for future research in the domain.
Problem

Research questions and friction points this paper is trying to address.

academic integrity
authorship verification
stylometry
machine-generated text
plagiarism
Innovation

Methods, ideas, or system contributions that make the work stand out.

stylometry
authorship attribution
machine-generated text detection
multi-author analysis
NLP for academic integrity
🔎 Similar Papers
2024-06-21Journal of Artificial Intelligence ResearchCitations: 6
H
H. Albaqami
Department of Computer Science and Artificial Intelligence, College of Computer Science and Engineering, University of Jeddah, Jeddah 21493, Saudi Arabia
M
Muhammad Asif Ayub
University of Engineering and Technology, Peshawar, Pakistan
Nasir Ahmad
Nasir Ahmad
Donders Institute for Brain, Cognition, and Behaviour
Machine LearningComputational NeuroscienceTheoretical NeuroscienceSynaptic PlasticityNeural Networks
Y
Yaseen Ahmad
University of Engineering and Technology, Peshawar, Pakistan
M
Mohammed M. Alqahtani
Department of Computer Science and Artificial Intelligence, College of Computer Science and Engineering, University of Jeddah, Jeddah 21493, Saudi Arabia
A
Abdullah M. Algamdi
Department of Computer Science and Artificial Intelligence, College of Computer Science and Engineering, University of Jeddah, Jeddah 21493, Saudi Arabia
A
Almoaid A. Owaidah
Department of Management Information Systems, Faculty of Economics and Administration, King Abdulaziz University, Jeddah, Jeddah 21589, Saudi Arabia
Kashif Ahmad
Kashif Ahmad
Munster Technological University, Cork, Ireland
Artificial IntelligenceMultimedia AnalyticsSocial Media InformaticsNLP