"I Would Have Written My Code Differently'': Beginners Struggle to Understand LLM-Generated Code

📅 2025-04-26

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

Novice programmers struggle to comprehend and evaluate code generated by large language models (LLMs), yet the cognitive bottlenecks underlying this difficulty remain poorly quantified. Method: This study conducts the first empirical, cognition-informed investigation—grounded in cognitive task analysis—using a controlled Python programming experiment. Thirty-two CS1 students completed 160 tasks involving LLM-generated code, each paired with natural-language functional descriptions; behavioral logs and questionnaire responses were collected. Contribution/Results: Three primary barriers emerged: (1) limited semantic comprehension among non-native English speakers, (2) insufficient Python syntactic knowledge, and (3) automation bias leading to uncritical trust. Overall accuracy in judging code correctness was only 32.5%, with consistent deficits across subgroups. Moving beyond prompt-engineering–centric approaches, this work establishes the first reproducible cognitive benchmark and empirically validated obstacle map for LLM-augmented programming education.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are being increasingly adopted for programming work. Prior work shows that while LLMs accelerate task completion for professional programmers, beginning programmers struggle to prompt models effectively. However, prompting is just half of the code generation process -- when code is generated, it must be read, evaluated, and integrated (or rejected). How accessible are these tasks for beginning programmers? This paper measures how well beginners comprehend LLM-generated code and explores the challenges students face in judging code correctness. We compare how well students understand natural language descriptions of functions and LLM-generated implementations, studying 32 CS1 students on 160 task instances. Our results show a low per-task success rate of 32.5%, with indiscriminate struggles across demographic populations. Key challenges include barriers for non-native English speakers, unfamiliarity with Python syntax, and automation bias. Our findings highlight the barrier that code comprehension presents to beginning programmers seeking to write code with LLMs.

Problem

Research questions and friction points this paper is trying to address.

Beginners struggle to comprehend LLM-generated code effectively

Students face challenges in judging code correctness from LLM outputs

Non-native English speakers encounter barriers in understanding Python syntax

Innovation

Methods, ideas, or system contributions that make the work stand out.

Measuring beginners' comprehension of LLM-generated code

Comparing natural language descriptions with LLM code

Identifying key challenges in code comprehension

🔎 Similar Papers

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study