CodeImprove: Program Adaptation for Deep Code

📅 2025-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep learning code models suffer significant performance degradation under input distribution shift (i.e., out-of-domain code). To address this, we propose a training-free, program-level input adaptation method. First, we design a code validity score—grounded in static and dynamic analysis—to accurately detect out-of-domain inputs (AUC = 0.924). Second, we introduce an AST-aware, semantics-preserving program transformation framework, integrating genetic algorithm optimization with execution-feedback-driven adaptation. This work is the first to jointly incorporate input validation and adaptive transformation for discrete code data. Evaluated on three state-of-the-art code models across two software engineering tasks—code summarization and bug detection—our approach achieves an average accuracy improvement of 8.78% (51.28% relative gain), substantially enhancing model robustness and practical utility on complex, heterogeneous code inputs.

Technology Category

Application Category

📝 Abstract
Leveraging deep learning (DL)-based code analysis tools to solve software engineering tasks is becoming increasingly popular. Code models often suffer performance degradation due to various reasons (e.g., code data shifts). Retraining is often required to address these issues, but frequent model updates are costly in labeling and deployment. In this paper, we explore an alternative solution: Adapting the program inputs to the code models. This can be achieved by two steps: 1) input validation that focuses on identifying whether an input is an out-of-scope input program that are beyond a model's handling capability, and 2) input adaptation that adapts out-of-scope inputs to become in-scope inputs. Validating program input is challenging, as current techniques focus on continuous inputs such as image data and fail with discrete inputs like code data, which have unique characteristics and are processed differently by deep learning models. Adapting out-of-scope programs is also challenging due to their vast search spaces. Therefore, in this paper, we propose CodeImprove, which distinguishes out-of-scope from normal inputs and converts such out-of-scope inputs back to in-scope inputs through program transformation. In particular, we propose a validity score metric to identify out-of-scope inputs and leverage genetic algorithms to apply semantic preserving program transformation to convert out-of-scope inputs to in-scope inputs. Our experimental results show CodeImprove can enhance up to 8.78% of accuracy, and 51.28% of relative improvements in three code models on two SE tasks. Additionally, our input validation is promising in detecting out-of-scope inputs (AUC score of 0.924).
Problem

Research questions and friction points this paper is trying to address.

Complex Code Handling
Deep Learning Tools
Software Development Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

CodeImprove
Deep Learning Adaptation
Code Analysis Enhancement
🔎 Similar Papers
No similar papers found.
Ravishka Rathnasuriya
Ravishka Rathnasuriya
The University of Texas at Dallas
Software EngineeringAI4SESE4AIProgram AnalysisAdversarial Machine Learning
Z
Zijie Zhao
University of Pennsylvania, USA
W
Wei Yang
University of Texas at Dallas, USA