Coding in a Bubble? Evaluating LLMs in Resolving Context Adaptation Bugs During Code Adaptation

📅 2026-01-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that large language models (LLMs) struggle to fix context-adaptation bugs (CtxBugs)—defects arising from changes in contextual environments—due to the need for cross-context semantic reasoning that cannot be resolved through local edits. To systematically study this issue, the authors propose CtxBugGen, a framework comprising task selection, targeted perturbation, LLM-based variant generation, and bug identification, which provides the first formal definition and synthetic generation of CtxBugs and establishes the first benchmark dedicated to evaluating such defects. Experimental results reveal that even the current state-of-the-art model, Kimi-K2, achieves only a 55.93% Pass@1 accuracy and successfully repairs merely 52.47% of CtxBugs. Moreover, models often replicate rather than correct these bugs, with performance degrading by up to 30%, highlighting a critical limitation of LLMs in cross-context code adaptation.

Technology Category

Application Category

📝 Abstract
Code adaptation is a fundamental but challenging task in software development, requiring developers to modify existing code for new contexts. A key challenge is to resolve Context Adaptation Bugs (CtxBugs), which occurs when code correct in its original context violates constraints in the target environment. Unlike isolated bugs, CtxBugs cannot be resolved through local fixes and require cross-context reasoning to identify semantic mismatches. Overlooking them may lead to critical failures in adaptation. Although Large Language Models (LLMs) show great potential in automating code-related tasks, their ability to resolve CtxBugs remains a significant and unexplored obstacle to their practical use in code adaptation. To bridge this gap, we propose CtxBugGen, a novel framework for generating CtxBugs to evaluate LLMs. Its core idea is to leverage LLMs'tendency to generate plausible but context-free code when contextual constraints are absent. The framework generates CtxBugs through a four-step process to ensure their relevance and validity: (1) Adaptation Task Selection, (2) Task-specific Perturbation,(3) LLM-based Variant Generation and (4) CtxBugs Identification. Based on the benchmark constructed by CtxBugGen, we conduct an empirical study with four state-of-the-art LLMs. Our results reveal their unsatisfactory performance in CtxBug resolution. The best performing LLM, Kimi-K2, achieves 55.93% on Pass@1 and resolves just 52.47% of CtxBugs. The presence of CtxBugs degrades LLMs'adaptation performance by up to 30%. Failure analysis indicates that LLMs often overlook CtxBugs and replicate them in their outputs. Our study highlights a critical weakness in LLMs'cross-context reasoning and emphasize the need for new methods to enhance their context awareness for reliable code adaptation.
Problem

Research questions and friction points this paper is trying to address.

Context Adaptation Bugs
Code Adaptation
Large Language Models
Cross-context Reasoning
Context Awareness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context Adaptation Bugs
CtxBugGen
Cross-context Reasoning
Code Adaptation
Large Language Models
🔎 Similar Papers
No similar papers found.
Tanghaoran Zhang
Tanghaoran Zhang
National University of Defense Technology
software engineering
X
Xinjun Mao
National University of Defense Technology, China
Shangwen Wang
Shangwen Wang
National University of Defense Technology
software engineering
Y
Yuxin Zhao
National University of Defense Technology, China
Yao Lu
Yao Lu
National University of Singapore
AI systems
Z
Zezhou Tang
National University of Defense Technology, China
W
Wenyu Xu
National University of Defense Technology, China
L
Longfei Sun
National University of Defense Technology, China
C
Changrong Xie
National University of Defense Technology, China
Kang Yang
Kang Yang
National University of Defense Technology
AI4SE: Program ComprehensionCode Search/GenerationNLP: Text SummarizationGEC
Yue Yu
Yue Yu
Professor at Pengcheng Laboratory
Software EngineeringDistributed ComputingArtificial Intelligence System