🤖 AI Summary
This study addresses the opacity of memory mechanisms in conversational AI systems, which poses significant risks to user privacy, agency, and profile accuracy. Drawing on a novel dataset of 2,050 memory entries from 80 real ChatGPT users, the research employs content analysis, GDPR-based personal data classification, psychological inference detection, and query rewriting techniques to empirically demonstrate that 96% of these memories are unilaterally generated by the system. Among them, 28% contain personal data as defined under the GDPR, and 52% encode psychological insights. To mitigate these concerns, the work proposes Attribution Shield—a user empowerment framework that proactively alerts users to sensitive inferences and recommends query rewrites—thereby significantly enhancing user control over AI-generated memories while preserving interactive utility.
📝 Abstract
To enable personalized and context-aware interactions, conversational AI systems have introduced a new mechanism: Memory. Memory creates what we refer to as the Algorithmic Self-portrait - a new form of personalization derived from users'self-disclosed information divulged within private conversations. While memory enables more coherent exchanges, the underlying processes of memory creation remain opaque, raising critical questions about data sensitivity, user agency, and the fidelity of the resulting portrait. To bridge this research gap, we analyze 2,050 memory entries from 80 real-world ChatGPT users. Our analyses reveal three key findings: (1) A striking 96% of memories in our dataset are created unilaterally by the conversational system, potentially shifting agency away from the user; (2) Memories, in our dataset, contain a rich mix of GDPR-defined personal data (in 28% memories) along with psychological insights about participants (in 52% memories); and (3)~A significant majority of the memories (84%) are directly grounded in user context, indicating faithful representation of the conversations. Finally, we introduce a framework-Attribution Shield-that anticipates these inferences, alerts about potentially sensitive memory inferences, and suggests query reformulations to protect personal information without sacrificing utility.