The Effect of Code Obfuscation on Human Program Comprehension

📅 2026-03-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the impact of code obfuscation on human program comprehension. Through output prediction experiments in Python and JavaScript, it systematically evaluates how multi-level obfuscation techniques—including identifier renaming, adversarial naming, and control flow modifications—affect comprehension accuracy and response time, as well as their interaction with programming experience. The findings reveal that obfuscation generally impairs comprehension efficiency, yet certain renaming strategies in Python unexpectedly enhance understanding. Obfuscation also shifts cognitive strategies from heuristic to more deliberative processing. Moderate thinking time correlates with optimal performance, and while programming experience confers benefits within a language, its transfer across languages is limited. Notably, the relationship between obfuscation strength and comprehension difficulty is non-monotonic and exhibits language-specific characteristics.

Technology Category

Application Category

📝 Abstract
We investigate how code obfuscation influences human understanding of programs through an output-prediction task. To study this effect, we construct multiple levels of obfuscation, ranging from unobfuscated code to transformations involving identifier renaming, adversarially misleading identifiers, control-flow modifications, and combinations of these techniques. These transformations are applied to function-level programs written in Python and JavaScript. Participants were asked to predict program outputs while we recorded correctness, response time, and self-reported programming experience. Our results show that obfuscation generally increases the time required to reason about code and tends to reduce prediction accuracy. However, the relationship between obfuscation strength and performance is not strictly monotonic and varies across programming languages. JavaScript exhibits the expected pattern of increasing difficulty with stronger obfuscation, whereas Python displays a more complex trend in which certain renaming transformations can perform comparably to, or occasionally better than, the unobfuscated baseline. Response-time analyses further suggest that obfuscation shifts participants away from rapid, heuristic reasoning toward slower and more deliberate reasoning processes. Performance appears highest within a moderate range of response times, indicating that careful deliberation can improve accuracy, while extremely long response times often correspond to confusion. Finally, programming experience predicts performance primarily within a given language, with limited transfer across languages, suggesting that obfuscation challenges language-specific familiarity more than general programming ability.
Problem

Research questions and friction points this paper is trying to address.

code obfuscation
program comprehension
human reasoning
output prediction
programming languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

code obfuscation
program comprehension
human reasoning
programming languages
empirical study
🔎 Similar Papers
No similar papers found.
A
Anh H. N. Nguyen
University of Texas at Dallas, USA
J
Jack Le
University of Texas at Dallas, USA
I
Ilse Lahnstein Coronado
University of Texas at Dallas, USA
Tien N. Nguyen
Tien N. Nguyen
Professor, School of Engineering and Computer Science - The University of Texas at Dallas
AI4SEAutomated Software EngineeringArtificial IntelligenceMining Software Repositories