🤖 AI Summary
This study systematically investigates memorization and data leakage risks in code large language models (LLMs) during pretraining and fine-tuning, with a focus on retention and forgetting mechanisms for sensitive information—including API keys, software licenses, and data provenance. We introduce a customized data extraction benchmark, integrating diverse extraction attacks, rigorous human validation, and cross-scale model comparisons. Our key findings are: (1) fine-tuning substantially reduces extractability of pretraining data (e.g., StarCoder2-15B drops from 54.9% to 23.5%), yet amplifies small models’ sensitivity to fine-tuning data; (2) licenses and data-source identifiers are both most readily memorized initially and most susceptible to overwriting during fine-tuning. These results establish the first quantifiable evaluation framework for assessing data security in code LLMs, providing empirically grounded insights to inform responsible data governance, model development, and regulatory compliance.
📝 Abstract
Code language models, while widely popular, are often trained on unsanitized source code gathered from across the Internet. Previous work revealed that pre-trained models can remember the content of their training data and regurgitate them through data extraction attacks. Due to the large size of current models, only a few entities have the resources for pre-training such models. However, fine-tuning requires fewer resources and is increasingly used by both small and large entities for its effectiveness on specialized data. Such small curated data for fine-tuning might contain sensitive information or proprietary assets. In this study, we attack both pre-trained and fine-tuned code language models to investigate the extent of data extractability. We first develop a custom benchmark to assess the vulnerability of both pre-training and fine-tuning samples to extraction attacks. Our findings reveal that 54.9% of extractable pre-training data could be retrieved from StarCoder2-15B, whereas this number decreased to 23.5% after fine-tuning. This indicates that fine-tuning reduces the extractability of pre-training data. However, compared to larger models, fine-tuning smaller models increases their vulnerability to data extraction attacks on fine-tuning data. Given the potential sensitivity of fine-tuning data, this can lead to more severe consequences. Lastly, we also manually analyzed 2000 extractable samples before and after fine-tuning. We also found that data carriers and licensing information are the most likely data categories to be memorized from pre-trained and fine-tuned models, while the latter is the most likely to be forgotten after fine-tuning.