Do Large Code Models Understand Programming Concepts? Counterfactual Analysis for Code Predicates

📅 2024-02-08

🏛️ International Conference on Machine Learning

📈 Citations: 6

✨ Influential: 1

career value

172K/year

🤖 AI Summary

Existing evaluation methods inadequately capture whether large code models genuinely comprehend core programming logic concepts—such as data flow and control flow—at the semantic level. Method: To address this gap, we propose CACP, the first counterfactual black-box evaluation framework targeting programming concept predicates. CACP requires no access to model internals; instead, it systematically quantifies conceptual understanding via program predicate modeling and abstract representations of control and data flow. Contribution/Results: CACP evaluates 10 mainstream large code models across four critical programming concept categories. Experimental results reveal a fundamental deficiency in all models’ grasp of basic logical concepts, with performance substantially below human programmer expectations. This work establishes a novel benchmark and methodology for interpretability analysis and capability assessment of large language models in code-related tasks.

Technology Category

Application Category

📝 Abstract

Large Language Models' success on text generation has also made them better at code generation and coding tasks. While a lot of work has demonstrated their remarkable performance on tasks such as code completion and editing, it is still unclear as to why. We help bridge this gap by exploring to what degree auto-regressive models understand the logical constructs of the underlying programs. We propose Counterfactual Analysis for Programming Concept Predicates (CACP) as a counterfactual testing framework to evaluate whether Large Code Models understand programming concepts. With only black-box access to the model, we use CACP to evaluate ten popular Large Code Models for four different programming concepts. Our findings suggest that current models lack understanding of concepts such as data flow and control flow.

Problem

Research questions and friction points this paper is trying to address.

Evaluate Large Code Models' understanding of programming concepts.

Propose Counterfactual Analysis for Programming Concept Predicates.

Assess models' grasp of data flow and control flow.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Counterfactual Analysis for Programming

Evaluate Large Code Models

Black-box access technique

🔎 Similar Papers

No similar papers found.