Toward provably private analytics and insights into GenAI use

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

To address privacy-sensitive data analysis challenges on large-scale edge devices, this paper proposes a Trusted Execution Environment (TEE)-based federated analytics system. The system isolates raw data and all derived computations entirely within hardware-enforced TEEs—such as AMD SEV-SNP and Intel TDX—preventing operators from accessing plaintext. It integrates LLM-driven structured summarization of unstructured data with auto-tuned differential privacy mechanisms to deliver end-to-end verifiable privacy guarantees. Innovatively, it unifies encrypted data upload, TEE-hosted key management, and dynamic privacy budget allocation, significantly improving resource efficiency while preserving data quality and utility. Deployed in production, the system generates aggregated, GenAI-powered insights on real-world user experiences that fully comply with stringent privacy regulations.

Technology Category

Application Category

📝 Abstract

Large-scale systems that compute analytics over a fleet of devices must achieve high privacy and security standards while also meeting data quality, usability, and resource efficiency expectations. We present a next-generation federated analytics system that uses Trusted Execution Environments (TEEs) based on technologies like AMD SEV-SNP and Intel TDX to provide verifiable privacy guarantees for all server-side processing. In our system, devices encrypt and upload data, tagging it with a limited set of allowable server-side processing steps. An open source, TEE-hosted key management service guarantees that the data is accessible only to those steps, which are themselves protected by TEE confidentiality and integrity assurance guarantees. The system is designed for flexible workloads, including processing unstructured data with LLMs (for structured summarization) before aggregation into differentially private insights (with automatic parameter tuning). The transparency properties of our system allow any external party to verify that all raw and derived data is processed in TEEs, protecting it from inspection by the system operator, and that differential privacy is applied to all released results. This system has been successfully deployed in production, providing helpful insights into real-world GenAI experiences.

Problem

Research questions and friction points this paper is trying to address.

Achieving verifiable privacy guarantees in federated analytics systems

Protecting unstructured data during LLM processing and aggregation

Enabling external verification of TEE-based confidential computing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Trusted Execution Environments for verifiable privacy

Employs encrypted data tagging for controlled server processing

Combines LLM summarization with differential privacy guarantees

🔎 Similar Papers

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions