🤖 AI Summary
To address privacy-sensitive data analysis challenges on large-scale edge devices, this paper proposes a Trusted Execution Environment (TEE)-based federated analytics system. The system isolates raw data and all derived computations entirely within hardware-enforced TEEs—such as AMD SEV-SNP and Intel TDX—preventing operators from accessing plaintext. It integrates LLM-driven structured summarization of unstructured data with auto-tuned differential privacy mechanisms to deliver end-to-end verifiable privacy guarantees. Innovatively, it unifies encrypted data upload, TEE-hosted key management, and dynamic privacy budget allocation, significantly improving resource efficiency while preserving data quality and utility. Deployed in production, the system generates aggregated, GenAI-powered insights on real-world user experiences that fully comply with stringent privacy regulations.
📝 Abstract
Large-scale systems that compute analytics over a fleet of devices must achieve high privacy and security standards while also meeting data quality, usability, and resource efficiency expectations. We present a next-generation federated analytics system that uses Trusted Execution Environments (TEEs) based on technologies like AMD SEV-SNP and Intel TDX to provide verifiable privacy guarantees for all server-side processing. In our system, devices encrypt and upload data, tagging it with a limited set of allowable server-side processing steps. An open source, TEE-hosted key management service guarantees that the data is accessible only to those steps, which are themselves protected by TEE confidentiality and integrity assurance guarantees. The system is designed for flexible workloads, including processing unstructured data with LLMs (for structured summarization) before aggregation into differentially private insights (with automatic parameter tuning). The transparency properties of our system allow any external party to verify that all raw and derived data is processed in TEEs, protecting it from inspection by the system operator, and that differential privacy is applied to all released results. This system has been successfully deployed in production, providing helpful insights into real-world GenAI experiences.