On Explaining (Large) Language Models For Code Using Global Code-Based Explanations

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

To address the lack of interpretability in large language models for code (LLM4Code), this paper proposes Code$Q$, the first causality-based attribution method targeting global code semantics. Code$Q$ rigorously identifies token subsets in natural language inputs and performs perturbation analysis to quantify the causal influence of individual linguistic elements—e.g., prepositions like “at”—on generated code, yielding fine-grained, empirically verifiable explanations. It introduces a novel dual-evaluation framework contrasting human plausibility against model plausibility to assess explanation credibility, integrates information-theoretic importance scoring, and validates usability through controlled user studies. On code completion and test generation tasks, Code$Q$ significantly improves explanation fidelity. User experiments confirm that it clearly exposes input–output causal relationships and effectively enhances developers’ trust calibration toward model outputs.

Technology Category

Application Category

📝 Abstract

In recent years, Language Models for Code (LLM4Code) have significantly changed the landscape of software engineering (SE) on downstream tasks, such as code generation, by making software development more efficient. Therefore, a growing interest has emerged in further evaluating these Language Models to homogenize the quality assessment of generated code. As the current evaluation process can significantly overreact on accuracy-based metrics, practitioners often seek methods to interpret LLM4Code outputs beyond canonical benchmarks. While the majority of research reports on code generation effectiveness in terms of expected ground truth, scant attention has been paid to LLMs' explanations. In essence, the decision-making process to generate code is hard to interpret. To bridge this evaluation gap, we introduce code rationales (Code$Q$), a technique with rigorous mathematical underpinning, to identify subsets of tokens that can explain individual code predictions. We conducted a thorough Exploratory Analysis to demonstrate the method's applicability and a User Study to understand the usability of code-based explanations. Our evaluation demonstrates that Code$Q$ is a powerful interpretability method to explain how (less) meaningful input concepts (i.e., natural language particle `at') highly impact output generation. Moreover, participants of this study highlighted Code$Q$'s ability to show a causal relationship between the input and output of the model with readable and informative explanations on code completion and test generation tasks. Additionally, Code$Q$ also helps to uncover model rationale, facilitating comparison with a human rationale to promote a fair level of trust and distrust in the model.

Problem

Research questions and friction points this paper is trying to address.

Explain decision-making process of code-generating language models

Evaluate language models beyond accuracy-based metrics

Provide interpretable explanations for model-generated code outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Code$Q$ for token-based code explanations

Uses mathematical rigor to explain model predictions

Enables causal input-output analysis in code generation

🔎 Similar Papers

No similar papers found.