🤖 AI Summary
This study systematically evaluates large language models’ (LLMs) capability to extract eight categories of sensitive personal information (e.g., names, phone numbers, email addresses) from public text, revealing significantly higher accuracy and generalization than traditional methods (e.g., regular expressions, named entity recognition). To support this evaluation, we construct the first multi-source annotated dataset integrating synthetically generated and real-world data. We further propose a novel defense mechanism based on prompt injection: lightweight semantic干扰 instructions are embedded into inputs to degrade LLMs’ sensitivity to personally identifiable information. Experiments span 10 mainstream LLMs and 5 diverse datasets. Results demonstrate that our defense substantially reduces attack success rates of strong LLMs—degrading their performance to levels comparable with conventional methods. This work provides the first empirical evidence that prompt injection serves as a practical, low-overhead privacy-preserving paradigm for mitigating LLM-based PII extraction risks.
📝 Abstract
Automatically extracting personal information--such as name, phone number, and email address--from publicly available profiles at a large scale is a stepstone to many other security attacks including spear phishing. Traditional methods--such as regular expression, keyword search, and entity detection--achieve limited success at such personal information extraction. In this work, we perform a systematic measurement study to benchmark large language model (LLM) based personal information extraction and countermeasures. Towards this goal, we present a framework for LLM-based extraction attacks; collect four datasets including a synthetic dataset generated by GPT-4 and three real-world datasets with manually labeled eight categories of personal information; introduce a novel mitigation strategy based on prompt injection; and systematically benchmark LLM-based attacks and countermeasures using ten LLMs and five datasets. Our key findings include: LLM can be misused by attackers to accurately extract various personal information from personal profiles; LLM outperforms traditional methods; and prompt injection can defend against strong LLM-based attacks, reducing the attack to less effective traditional ones.