🤖 AI Summary
This work exposes a critical privacy risk: generative AI browser assistants silently collect sensitive user data and construct cross-site behavioral profiles without explicit user interaction. To systematically audit this behavior, we analyze the top ten mainstream assistants using a novel joint auditing framework that combines network traffic analysis with explainable prompt injection. Our findings reveal that all ten assistants rely on remote APIs; eight transmit full HTML documents and form contents; six share persistent user identifiers with third parties; and seven reliably infer demographic attributes—including age, gender, and income—for response personalization, with none providing effective user-controllable privacy mechanisms. Beyond uncovering pervasive implicit data collection and cross-context profiling in GenAI assistants, this study establishes the first privacy auditing paradigm specifically designed for AI-native applications, advancing both empirical understanding and methodological rigor in AI privacy research.
📝 Abstract
Generative AI (GenAI) browser assistants integrate powerful capabilities of GenAI in web browsers to provide rich experiences such as question answering, content summarization, and agentic navigation. These assistants, available today as browser extensions, can not only track detailed browsing activity such as search and click data, but can also autonomously perform tasks such as filling forms, raising significant privacy concerns. It is crucial to understand the design and operation of GenAI browser extensions, including how they collect, store, process, and share user data. To this end, we study their ability to profile users and personalize their responses based on explicit or inferred demographic attributes and interests of users. We perform network traffic analysis and use a novel prompting framework to audit tracking, profiling, and personalization by the ten most popular GenAI browser assistant extensions. We find that instead of relying on local in-browser models, these assistants largely depend on server-side APIs, which can be auto-invoked without explicit user interaction. When invoked, they collect and share webpage content, often the full HTML DOM and sometimes even the user's form inputs, with their first-party servers. Some assistants also share identifiers and user prompts with third-party trackers such as Google Analytics. The collection and sharing continues even if a webpage contains sensitive information such as health or personal information such as name or SSN entered in a web form. We find that several GenAI browser assistants infer demographic attributes such as age, gender, income, and interests and use this profile--which carries across browsing contexts--to personalize responses. In summary, our work shows that GenAI browser assistants can and do collect personal and sensitive information for profiling and personalization with little to no safeguards.