🤖 AI Summary
To address the challenge of efficiently selecting pretrained code models (PCMs) for downstream tasks—including code generation, summarization, and vulnerability detection—this paper proposes a fine-tuning-free, parameter-agnostic model selection method. Our core innovation is a surrogate model that quantifies zero-shot transferability by measuring alignment between latent-space feature-label distributions (e.g., via KL divergence or MMD), thereby establishing the first fine-tuning-free PCM transferability prediction paradigm. Evaluated on 100 open-source PCMs spanning 42.5M–3B parameters, our approach selects optimal models in just 100 seconds—97,200× faster than full fine-tuning—while incurring less than a 6% drop in predictive performance. This significantly enhances deployment efficiency and controllable generalization of code intelligence models.
📝 Abstract
Pre-training a language model and then fine-tuning it has shown to be an efficient and effective technique for a wide range of code intelligence tasks, such as code generation, code summarization, and vulnerability detection. However, pretraining language models on a large-scale code corpus is computationally expensive. Fortunately, many off-the-shelf Pre-trained Code Models (PCMs), such as CodeBERT, CodeT5, CodeGen, and Code Llama, have been released publicly. These models acquire general code understanding and generation capability during pretraining, which enhances their performance on downstream code intelligence tasks. With an increasing number of these public pre-trained models, selecting the most suitable one to reuse for a specific task is essential. In this paper, we systematically investigate the reusability of PCMs. We first explore three intuitive model selection methods that select by size, training data, or brute-force fine-tuning. Experimental results show that these straightforward techniques either perform poorly or suffer high costs. Motivated by these findings, we explore learning-based model selection strategies that utilize pre-trained models without altering their parameters. Specifically, we train proxy models to gauge the performance of pre-trained models, and measure the distribution deviation between a model's latent features and the task's labels, using their closeness as an indicator of model transferability. We conduct experiments on 100 widely-used opensource PCMs for code intelligence tasks, with sizes ranging from 42.5 million to 3 billion parameters. The results demonstrate that learning-based selection methods reduce selection time to 100 seconds, compared to 2,700 hours with brute-force fine-tuning, with less than 6% performance degradation across related tasks.