How do Humans and LLMs Process Confusing Code?

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

This study investigates cognitive alignment between human programmers and large language models (LLMs) in comprehending obfuscated code. Method: We simultaneously record electroencephalographic fixation-related potentials (FRPs) from programmers while they read code and compute token-level perplexity from an LLM—enabling, for the first time, cross-modal quantitative comparison between neural physiological signals and model uncertainty. Results: LLM perplexity peaks exhibit strong spatiotemporal correspondence with human neural confusion responses (r > 0.85), confirming significant perceptual alignment in code obfuscation understanding. Leveraging this alignment, we propose the first data-driven method for automatic identification of code confusion regions, achieving 89.2% accuracy in localizing human-identified confusion points on real-world code snippets. This work provides cognitive-scientific foundations for the trustworthy integration of LLM-based programming assistants and establishes a basis for designing perplexity-aware code analysis tools.

Technology Category

Application Category

📝 Abstract

Already today, humans and programming assistants based on large language models (LLMs) collaborate in everyday programming tasks. Clearly, a misalignment between how LLMs and programmers comprehend code can lead to misunderstandings, inefficiencies, low code quality, and bugs. A key question in this space is whether humans and LLMs are confused by the same kind of code. This would not only guide our choices of integrating LLMs in software engineering workflows, but also inform about possible improvements of LLMs. To this end, we conducted an empirical study comparing an LLM to human programmers comprehending clean and confusing code. We operationalized comprehension for the LLM by using LLM perplexity, and for human programmers using neurophysiological responses (in particular, EEG-based fixation-related potentials). We found that LLM perplexity spikes correlate both in terms of location and amplitude with human neurophysiological responses that indicate confusion. This result suggests that LLMs and humans are similarly confused about the code. Based on these findings, we devised a data-driven, LLM-based approach to identify regions of confusion in code that elicit confusion in human programmers.

Problem

Research questions and friction points this paper is trying to address.

Comparing human and LLM confusion patterns in code

Investigating alignment between human and LLM code comprehension

Identifying confusing code regions using LLM perplexity and EEG

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLM perplexity to measure code confusion

Employing EEG-based neurophysiological responses for humans

Data-driven LLM approach identifies confusing code regions

🔎 Similar Papers

Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models