🤖 AI Summary
Existing cross-device federated analytics systems face bottlenecks in accuracy, flexibility, and scalability. This paper proposes the first federated analytics system deeply integrated with Trusted Execution Environments (TEEs), targeting lightweight statistical and monitoring queries—distinct from federated learning. Leveraging TEEs for verifiable privacy protection, the system combines distributed aggregation, resource-aware client-side scheduling, differential privacy, and secure multi-party computation to upload only minimal aggregated results. Evaluated at million-device scale, it achieves sub-second query latency and <0.5% error rate, while satisfying industrial-grade privacy compliance via formal verification and third-party audit. Key contributions include: (1) the first integration of TEEs into the federated analytics stack; (2) establishment of a statistical federated analytics paradigm; and (3) simultaneous achievement of strong privacy guarantees, high analytical accuracy, and practical scalability.
📝 Abstract
Cross-device Federated Analytics (FA) is a distributed computation paradigm designed to answer analytics queries about and derive insights from data held locally on users' devices. On-device computations combined with other privacy and security measures ensure that only minimal data is transmitted off-device, achieving a high standard of data protection. Despite FA's broad relevance, the applicability of existing FA systems is limited by compromised accuracy; lack of flexibility for data analytics; and an inability to scale effectively. In this paper, we describe our approach to combine privacy, scalability, and practicality to build and deploy a system that overcomes these limitations. Our FA system leverages trusted execution environments (TEEs) and optimizes the use of on-device computing resources to facilitate federated data processing across large fleets of devices, while ensuring robust, defensible, and verifiable privacy safeguards. We focus on federated analytics (statistics and monitoring), in contrast to systems for federated learning (ML workloads), and we flag the key differences.