🤖 AI Summary
The underlying mechanisms of code generation errors in large language models (LLMs) remain poorly understood. Method: Leveraging the HumanEval benchmark, this work systematically analyzes errors produced by six state-of-the-art LLMs and introduces, for the first time, a multidimensional, fine-grained error taxonomy integrating both semantic and syntactic dimensions. Using open coding and thematic analysis—augmented by statistical testing and qualitative root-cause attribution—the study identifies over ten recurrent error patterns, including logical flaws, boundary condition failures, and API misuse. Contribution/Results: The analysis reveals that LLM errors exhibit nontriviality, cross-line dependencies, and dispersed distribution—uncovering latent, deep-seated errors even in high-pass-rate tasks. It further demonstrates a nonlinear positive correlation between error frequency and task complexity. This taxonomy provides an interpretable, extensible theoretical foundation and empirical grounding for error localization, diagnosis, and repair in LLM-generated code.
📝 Abstract
Large Language Models (LLMs) have demonstrated unprecedented capabilities in code generation. However, there remains a limited understanding of code generation errors that LLMs can produce. To bridge the gap, we conducted an in-depth analysis of code generation errors across six representative LLMs on the HumanEval dataset. Specifically, we first employed open coding and thematic analysis to distill a comprehensive taxonomy of code generation errors. We analyzed two dimensions of error characteristics -- semantic characteristics and syntactic characteristics. Our analysis revealed that LLMs often made non-trivial, multi-line code generation errors in various locations and with various root causes. We further analyzed the correlation between these errors and task complexity as well as test pass rate. Our findings highlighted several challenges in locating and fixing code generation errors made by LLMs. In the end, we discussed several future directions to address these challenges.