GEP: A GCG-Based method for extracting personally identifiable information from chatbots built on small language models

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF

career value

222K/year
🤖 AI Summary
This work addresses the risk of personally identifiable information (PII) leakage in small language model (SLM)-powered chatbots. We propose GEP (Greedy Extraction Prompting), the first PII extraction attack framework specifically designed for SLMs. GEP employs greedy coordinate gradient (GCG) optimization to enable free-form adversarial prompting, achieving a 4.53% PII leakage detection rate under realistic, complex conversational contexts. Compared to conventional template-based methods, GEP improves PII extraction efficiency by up to 60× on the Alpaca and HealthCareMagic datasets. To validate robustness and practical relevance, we fine-tune the medical dialogue model ChatBioGPT and assess generation quality using BERTScore. Experimental results confirm GEP’s effectiveness and resilience in privacy risk assessment. This work establishes a novel paradigm for evaluating privacy vulnerabilities in SLM-based systems.

Technology Category

Application Category

📝 Abstract
Small language models (SLMs) become unprecedentedly appealing due to their approximately equivalent performance compared to large language models (LLMs) in certain fields with less energy and time consumption during training and inference. However, the personally identifiable information (PII) leakage of SLMs for downstream tasks has yet to be explored. In this study, we investigate the PII leakage of the chatbot based on SLM. We first finetune a new chatbot, i.e., ChatBioGPT based on the backbone of BioGPT using medical datasets Alpaca and HealthCareMagic. It shows a matchable performance in BERTscore compared with previous studies of ChatDoctor and ChatGPT. Based on this model, we prove that the previous template-based PII attacking methods cannot effectively extract the PII in the dataset for leakage detection under the SLM condition. We then propose GEP, which is a greedy coordinate gradient-based (GCG) method specifically designed for PII extraction. We conduct experimental studies of GEP and the results show an increment of up to 60$ imes$ more leakage compared with the previous template-based methods. We further expand the capability of GEP in the case of a more complicated and realistic situation by conducting free-style insertion where the inserted PII in the dataset is in the form of various syntactic expressions instead of fixed templates, and GEP is still able to reveal a PII leakage rate of up to 4.53%.
Problem

Research questions and friction points this paper is trying to address.

Investigating personally identifiable information leakage in small language model chatbots
Developing GCG-based method for effective PII extraction from SLMs
Evaluating PII leakage in complex scenarios with free-style insertion
Innovation

Methods, ideas, or system contributions that make the work stand out.

GEP uses GCG-based method for PII extraction
It targets chatbots built on small language models
Method handles free-style PII insertion scenarios