🤖 AI Summary
This work addresses the challenge of privacy-preserving analysis of LLM chatbot interaction data. We propose the first end-to-end differential privacy (DP) framework for extracting high-fidelity user behavioral insights under strict ε-DP guarantees. Methodologically, we integrate private clustering with multi-strategy keyword extraction—including LLM-guided approaches—augmented by partition selection, histogram summarization, and an improved TF-IDF variant, forming a verifiable privacy–utility co-optimization pipeline. We further introduce an empirical evaluation methodology jointly assessing privacy and utility via semantic similarity metrics and LLM-based judgment. Experiments demonstrate that, under stringent DP constraints (ε ≤ 1), our framework significantly outperforms non-private baselines in both semantic and lexical fidelity: achieving +23.6% higher similarity and +18.4% improvement on LLM discrimination metrics—reaching practical utility levels. The framework establishes a scalable, verifiable paradigm for compliant conversational data analytics.
📝 Abstract
We introduce $Urania$, a novel framework for generating insights about LLM chatbot interactions with rigorous differential privacy (DP) guarantees. The framework employs a private clustering mechanism and innovative keyword extraction methods, including frequency-based, TF-IDF-based, and LLM-guided approaches. By leveraging DP tools such as clustering, partition selection, and histogram-based summarization, $Urania$ provides end-to-end privacy protection. Our evaluation assesses lexical and semantic content preservation, pair similarity, and LLM-based metrics, benchmarking against a non-private Clio-inspired pipeline (Tamkin et al., 2024). Moreover, we develop a simple empirical privacy evaluation that demonstrates the enhanced robustness of our DP pipeline. The results show the framework's ability to extract meaningful conversational insights while maintaining stringent user privacy, effectively balancing data utility with privacy preservation.