🤖 AI Summary
This study addresses the challenge users face in assessing how large language models (LLMs) associate personal information with their names, posing significant privacy risks. Through a user-centered privacy audit involving 458 participants and a novel tool, LMP2, the work systematically identifies and categorizes nine practical frictions in LLM privacy auditing. It proposes a user-oriented privacy audit framework and highlights the fundamental ambiguity in defining “associated content” in generative AI. Employing black-box probing, user-driven interaction evaluations, multi-model comparisons, and mixed-method analysis, the research reveals that GPT-4o accurately predicts 11 out of 50 personal attributes for ordinary individuals at rates exceeding 60%, and demonstrates that using real names significantly increases the model’s output relevance to personal information, thereby confirming the risk of name-conditioned privacy leakage.
📝 Abstract
Large language models (LLMs) learn statistical associations from massive training corpora and user interactions, and deployed systems can surface or infer information about individuals. Yet people lack practical ways to inspect what a model associates with their name. We report interim findings from an ongoing study and introduce LMP2, a browser-based self-audit tool. In two user studies ($N_{total}{=}458$), GPT-4o predicts 11 of 50 features for everyday people with $\ge$60\% accuracy, and participants report wanting control over LLM-generated associations despite not considering all outputs privacy violations. To validate our probing method, we evaluate eight LLMs on public figures and non-existent names, observing clear separation between stable name-conditioned associations and model defaults. Our findings also contribute to exposing a broader generative AI evaluation crisis: when outputs are probabilistic, context-dependent, and user-mediated through elicitation, what model--individual associations even include is under-specified and operationalisation relies on crafting probes and metrics that are hard to validate or compare. To move towards reliable, actionable human-centred LLM privacy audits, we identify nine frictions that emerged in our study and offer recommendations for future work and the design of human-centred LLM privacy audits.