Why Personalizing Deep Learning-Based Code Completion Tools Matters

📅 2025-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
General-purpose deep learning code completion models suffer from poor domain adaptability, limiting their effectiveness in real-world development environments. Method: To address this, we systematically investigate parameter-efficient fine-tuning (PEFT) of T5 and Code Llama (60M–7B) models at both organizational and developer levels, using a cross-organization developer code dataset. We compare organizational-level, developer-level, and hybrid fine-tuning strategies. Contribution/Results: Our empirical study is the first to demonstrate that organizational-level fine-tuning yields substantial gains—improving completion accuracy by +12.3% on average—matching the performance of general-purpose models ten times larger in parameter count. This approach achieves strong generalization, reduces GPU resource consumption by ~40%, and maintains high practical utility. The work establishes a reproducible methodology and empirical benchmark for lightweight, production-ready customization of code completion tools.

Technology Category

Application Category

📝 Abstract
Deep learning (DL)-based code completion tools have transformed software development by enabling advanced code generation. These tools leverage models trained on vast amounts of code from numerous repositories, capturing general coding patterns. However, the impact of fine-tuning these models for specific organizations or developers to boost their performance on such subjects remains unexplored. In this work, we fill this gap by presenting solid empirical evidence answering this question. More specifically, we consider 136 developers from two organizations (Apache and Spring), two model architectures (T5 and Code Llama), and three model sizes (60M, 750M, and 7B trainable parameters). T5 models (60M, 750M) were pre-trained and fine-tuned on over 2,000 open-source projects, excluding the subject organizations' data, and compared against versions fine-tuned on organization- and developer-specific datasets. For the Code Llama model (7B), we compared the performance of the already pre-trained model publicly available online with the same model fine-tuned via parameter-efficient fine-tuning on organization- and developer-specific datasets. Our results show that there is a boost in prediction capabilities provided by both an organization-specific and a developer-specific additional fine-tuning, with the former being particularly performant. Such a finding generalizes across (i) the two subject organizations (i.e., Apache and Spring) and (ii) models of completely different magnitude (from 60M to 7B trainable parameters). Finally, we show that DL models fine-tuned on an organization-specific dataset achieve the same completion performance of pre-trained code models used out of the box and being $sim$10$ imes$ larger, with consequent savings in terms of deployment and inference cost (e.g., smaller GPUs needed).
Problem

Research questions and friction points this paper is trying to address.

Explores impact of fine-tuning DL-based code completion tools for specific organizations.
Compares performance of models fine-tuned on organization- and developer-specific datasets.
Demonstrates cost savings with smaller models achieving similar performance to larger ones.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning DL models for specific organizations boosts performance.
Parameter-efficient fine-tuning enhances Code Llama model capabilities.
Organization-specific models match larger pre-trained models' performance.