How Much Do Code Language Models Remember? An Investigation on Data Extraction Attacks before and after Fine-tuning

📅 2025-01-29

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study systematically investigates memorization and data leakage risks in code large language models (LLMs) during pretraining and fine-tuning, with a focus on retention and forgetting mechanisms for sensitive information—including API keys, software licenses, and data provenance. We introduce a customized data extraction benchmark, integrating diverse extraction attacks, rigorous human validation, and cross-scale model comparisons. Our key findings are: (1) fine-tuning substantially reduces extractability of pretraining data (e.g., StarCoder2-15B drops from 54.9% to 23.5%), yet amplifies small models’ sensitivity to fine-tuning data; (2) licenses and data-source identifiers are both most readily memorized initially and most susceptible to overwriting during fine-tuning. These results establish the first quantifiable evaluation framework for assessing data security in code LLMs, providing empirically grounded insights to inform responsible data governance, model development, and regulatory compliance.

Technology Category

Application Category

📝 Abstract

Code language models, while widely popular, are often trained on unsanitized source code gathered from across the Internet. Previous work revealed that pre-trained models can remember the content of their training data and regurgitate them through data extraction attacks. Due to the large size of current models, only a few entities have the resources for pre-training such models. However, fine-tuning requires fewer resources and is increasingly used by both small and large entities for its effectiveness on specialized data. Such small curated data for fine-tuning might contain sensitive information or proprietary assets. In this study, we attack both pre-trained and fine-tuned code language models to investigate the extent of data extractability. We first develop a custom benchmark to assess the vulnerability of both pre-training and fine-tuning samples to extraction attacks. Our findings reveal that 54.9% of extractable pre-training data could be retrieved from StarCoder2-15B, whereas this number decreased to 23.5% after fine-tuning. This indicates that fine-tuning reduces the extractability of pre-training data. However, compared to larger models, fine-tuning smaller models increases their vulnerability to data extraction attacks on fine-tuning data. Given the potential sensitivity of fine-tuning data, this can lead to more severe consequences. Lastly, we also manually analyzed 2000 extractable samples before and after fine-tuning. We also found that data carriers and licensing information are the most likely data categories to be memorized from pre-trained and fine-tuned models, while the latter is the most likely to be forgotten after fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Data Security

Data Leakage

Permission and Origin Information Retention

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data Security

Model Fine-tuning

Sensitive Data Leakage

🔎 Similar Papers

Towards More Realistic Extraction Attacks: An Adversarial Perspective