🤖 AI Summary
In program comprehension, eye-tracking analysis has long suffered from subjectivity and poor scalability due to reliance on manually defined Areas of Interest (AOIs) and fixed metrics. To address this, we propose *eye2vec*, the first method that models eye-movement trajectories during code reading as transition sequences among syntactic elements—e.g., tokens, statements, or AST nodes—and learns distributed representations to automatically capture their semantic structure. Integrating program analysis with deep learning, *eye2vec* eliminates the need for predefined AOI hierarchies (e.g., lexical, line-, or block-level) or domain-specific heuristics, directly generating context-aware vector embeddings from raw fixation data. Empirical evaluation demonstrates its effectiveness in characterizing cognitive transitions and semantic associations across multi-granular code units. It supports diverse downstream tasks—including code comprehension modeling, bug localization, and developer expertise inference—thereby significantly enhancing the automation, semantic expressiveness, and generalizability of eye-tracking analysis in software engineering.
📝 Abstract
This paper presents eye2vec, an infrastructure for analyzing software developers’ eye movements while reading source code. In common eye-tracking studies in program comprehension, researchers must preselect analysis targets such as control flow or syntactic elements, and then develop analysis methods to extract appropriate metrics from the fixation for source code. Here, researchers can define various levels of AOIs like words, lines, or code blocks, and the difference leads to different results. Moreover, the interpretation of fixation for word/line can vary across the purposes of the analyses. Hence, the eye-tracking analysis is a difficult task that depends on the time-consuming manual work of the researchers. eye2vec represents continuous two fixations as transitions between syntactic elements using distributed representations. The distributed representation facilitates the adoption of diverse data analysis methods with rich semantic interpretations.