🤖 AI Summary
Code large language models (LLMs) may inadvertently memorize sensitive information during training, posing privacy leakage risks. Existing prompt-engineering-based memory detection methods suffer from hallucination and low efficacy in extracting actual secrets. This paper proposes DESEC, the first framework to characterize real secrets at the token level using a novel four-dimensional feature representation. It introduces a two-stage mechanism—offline learning of a token-scoring model and online recalibration of decoding probabilities—eliminating reliance on prompt engineering. Leveraging token-level probability analysis, proxy-model distillation, and likelihood reweighting during decoding, DESEC achieves a 37.2% average improvement in both extraction rate and plausibility of real secrets across four state-of-the-art code LLMs, validated on a newly constructed secret-extraction evaluation benchmark. The framework enables scalable, fine-grained, and quantitative assessment of privacy leakage risk.
📝 Abstract
Code Large Language Models (LLMs) have demonstrated remarkable capabilities in generating, understanding, and manipulating programming code. However, their training process inadvertently leads to the memorization of sensitive information, posing severe privacy risks. Existing studies on memorization in LLMs primarily rely on prompt engineering techniques, which suffer from limitations such as widespread hallucination and inefficient extraction of the target sensitive information. In this paper, we present a novel approach to characterize real and fake secrets generated by Code LLMs based on token probabilities. We identify four key characteristics that differentiate genuine secrets from hallucinated ones, providing insights into distinguishing real and fake secrets. To overcome the limitations of existing works, we propose DESEC, a two-stage method that leverages token-level features derived from the identified characteristics to guide the token decoding process. DESEC consists of constructing an offline token scoring model using a proxy Code LLM and employing the scoring model to guide the decoding process by reassigning token likelihoods. Through extensive experiments on four state-of-the-art Code LLMs using a diverse dataset, we demonstrate the superior performance of DESEC in achieving a higher plausible rate and extracting more real secrets compared to existing baselines. Our findings highlight the effectiveness of our token-level approach in enabling an extensive assessment of the privacy leakage risks associated with Code LLMs.