Bridging Behavioral Biometrics and Source Code Stylometry: A Survey of Programmer Attribution

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the critical challenge of source code authorship identification in software engineering, security, and digital forensics. It presents a systematic literature review of 47 studies published between 2012 and 2025, offering the first integrated framework that unifies behavioral biometrics and code stylometry to capture both stylistic and behavioral features of programmers. Employing a systematic mapping approach combined with content analysis and clustering, the review reveals that existing research predominantly focuses on closed-world authorship attribution tasks, heavily relies on limited benchmark datasets, and exhibits significant gaps in authorship verification, open-world scenarios, and reproducibility. This work provides a structured synthesis of the field and offers methodological guidance for future research directions.

Technology Category

Application Category

📝 Abstract
Programmer attribution seeks to identify or verify the author of a source code artifact using stylistic, structural, or behavioural characteristics. This problem has been studied across software engineering, security, and digital forensics, resulting in a growing and methodologically diverse set of publications. This paper presents a systematic mapping study of programmer attribution research focused on source code analysis. From an initial set of 135 candidate publications, 47 studies published between 2012 and 2025 were selected through a structured screening process. The included works are analysed along several dimensions, including authorship tasks, feature categories, learning and modelling approaches, dataset sources, and evaluation practices. Based on this analysis, we derive a taxonomy that relates stylistic and behavioural feature types to commonly used machine learning techniques and provide a descriptive overview of publication trends, benchmarks, programming languages. A content-level analysis highlights the main thematic clusters in the field. The results indicate a strong focus on closed-world authorship attribution using stylometric features and a heavy reliance on a small number of benchmark datasets, while behavioural signals, authorship verification, and reproducibility remain less explored. The study consolidates existing research into a unified framework and outlines methodological gaps that can guide future work. This manuscript is currently under review. The present version is a preprint.
Problem

Research questions and friction points this paper is trying to address.

programmer attribution
source code stylometry
authorship identification
behavioral biometrics
software forensics
Innovation

Methods, ideas, or system contributions that make the work stand out.

programmer attribution
source code stylometry
behavioral biometrics
systematic mapping study
authorship verification
🔎 Similar Papers
No similar papers found.
M
Marek Horváth
Department of Computers and Informatics, Technical University of Košice, Slovakia
E
Emília Pietriková
Department of Computers and Informatics, Technical University of Košice, Slovakia
Diomidis Spinellis
Diomidis Spinellis
Professor, AUEB and TU Delft
Software EngineeringIT Security