Toward provably private analytics and insights into GenAI use

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address privacy-sensitive data analysis challenges on large-scale edge devices, this paper proposes a Trusted Execution Environment (TEE)-based federated analytics system. The system isolates raw data and all derived computations entirely within hardware-enforced TEEs—such as AMD SEV-SNP and Intel TDX—preventing operators from accessing plaintext. It integrates LLM-driven structured summarization of unstructured data with auto-tuned differential privacy mechanisms to deliver end-to-end verifiable privacy guarantees. Innovatively, it unifies encrypted data upload, TEE-hosted key management, and dynamic privacy budget allocation, significantly improving resource efficiency while preserving data quality and utility. Deployed in production, the system generates aggregated, GenAI-powered insights on real-world user experiences that fully comply with stringent privacy regulations.

Technology Category

Application Category

📝 Abstract
Large-scale systems that compute analytics over a fleet of devices must achieve high privacy and security standards while also meeting data quality, usability, and resource efficiency expectations. We present a next-generation federated analytics system that uses Trusted Execution Environments (TEEs) based on technologies like AMD SEV-SNP and Intel TDX to provide verifiable privacy guarantees for all server-side processing. In our system, devices encrypt and upload data, tagging it with a limited set of allowable server-side processing steps. An open source, TEE-hosted key management service guarantees that the data is accessible only to those steps, which are themselves protected by TEE confidentiality and integrity assurance guarantees. The system is designed for flexible workloads, including processing unstructured data with LLMs (for structured summarization) before aggregation into differentially private insights (with automatic parameter tuning). The transparency properties of our system allow any external party to verify that all raw and derived data is processed in TEEs, protecting it from inspection by the system operator, and that differential privacy is applied to all released results. This system has been successfully deployed in production, providing helpful insights into real-world GenAI experiences.
Problem

Research questions and friction points this paper is trying to address.

Achieving verifiable privacy guarantees in federated analytics systems
Protecting unstructured data during LLM processing and aggregation
Enabling external verification of TEE-based confidential computing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Trusted Execution Environments for verifiable privacy
Employs encrypted data tagging for controlled server processing
Combines LLM summarization with differential privacy guarantees
🔎 Similar Papers
No similar papers found.
Albert Cheu
Albert Cheu
Research Scientist, Google
Differential Privacy
A
Artem Lagzdin
Google
B
Brett McLarnon
Google
Daniel Ramage
Daniel Ramage
Google Research
Federated learningFederated analyticsMachine learning
K
Katharine Daly
Google
M
Marco Gruteser
Google
P
P. Kairouz
Google
R
Rakshita Tandon
Google
S
Stanislav Chiknavaryan
Google
T
Timon Van Overveldt
Google
Z
Zoe Gong
Google