๐ค AI Summary
This study addresses the lack of systematic investigation into the leakage of user conversation content and identity information from AI chatbots to third parties. Through controlled experiments, the authors capture and analyze network traffic from 20 mainstream AI chatbots under both normal and private browsing modes, employing sensitive data identification techniques to systematically assess data shared with advertising, analytics, and session replay services. The findings reveal that 17 platforms transmit data to at least one third party; notably, three leak full conversation transcripts in plaintext via Microsoft Clarity, 15 share session identifiers, and multiple expose personally identifiable information such as email addresses, IP addresses, or cookies. These results underscore the widespread and severe privacy risks inherent in current AI chatbot ecosystems.
๐ Abstract
AI chatbots are becoming a primary interface for seeking information. As their popularity grows, chatbot providers are starting to deploy advertising and analytics. Despite this, tracking on AI chatbots has not been systematically studied. We present a systematic measurement of web tracking on 20 popular AI chatbots. Under controlled settings using a sensitive prompt, we capture and compare network traffic in normal chats and, where supported, private chats. We search for exposure of two categories of information: content, including prompts, prompt-derived titles, chat URLs, and chat identifiers; and identity, including names, emails, account identifiers, first-party cookies, and explicit IP/User-Agent fields in payloads. We find that 17 of 20 chatbots share information with at least one third party. Three chatbots share plaintext conversation text, including both prompt and response snippets, with Microsoft Clarity through session replay. Fifteen chatbots share conversation URLs or chat identifiers with third-party advertising, analytics, or social endpoints. Several chatbots expose user identity through support widgets, analytics, advertising, and session replay tags; in some cases, hashed emails are shared.