🤖 AI Summary
This study addresses the privacy risks inherent in anonymized conversational logs, demonstrating that implicit content can still leak sensitive user attributes. Analyzing complete ChatGPT dialogues from users in four Global South countries, the work evaluates the risk of large language models (LLMs) inferring users’ age, gender, and nationality even after explicit personally identifiable information has been removed. Combining LLM-based filters, named entity recognition, and inference attacks, the research reveals—for the first time—that such demographic traits can be inferred with high accuracy (weighted F1 scores of 0.84–0.90) without any explicit identifiers, with most users identifiable within the first 5% of their conversation. Further analysis uncovers four stereotypical patterns driving these inference biases, underscoring the limitations of current anonymization practices.
📝 Abstract
Hundreds of millions of users now hold detailed, multi-turn conversations with ChatGPT and similar LLM assistants. We measure two privacy-relevant features of these conversations on a corpus of complete ChatGPT histories donated by over 1,000 users in four Global South countries (Brazil, India, Nigeria, Pakistan). First, on explicit disclosure: 34.5% of user messages contain personal information across a twenty-category taxonomy, with the median user first revealing identifying content within the first 14% of their conversation history. Second, on inference beyond explicit disclosure: we restrict to a cohort whose conversations contain no messages flagged by an LLM-based filter for explicit demographic self-identification (a separate NER pass marks PII for the disclosure audit but does not drive cohort exclusion). On this filtered cohort, an off the shelf large language model still recovers each user's age, gender, and country at weighted F1 of 0.84, 0.90, and 0.88, respectively, with the median user identified from the first 5% of their conversation history. Reading the model's natural-language reasoning traces, we identify four recurring stereotype patterns that drive both successful inference and an asymmetric error distribution concentrating on women in technical fields, older users with contemporary skills, and Global South tech professionals. We also compare ChatGPT against the same users' Google Search and YouTube histories as inference surfaces, and find it competitive with these older substrates that have driven behavioral advertising for two decades. Message-level PII removal is insufficient on its own as a privacy intervention for conversational AI data.