🤖 AI Summary
Addressing the dual challenges of privacy leakage and data silos in cross-domain big data collaboration, this paper presents a systematic survey on the deep integration of federated learning (FL) with big data services. It examines four core stages—data acquisition, storage, analytics, and privacy preservation—and extends the analysis to representative application domains, including smart cities and intelligent healthcare. The work introduces, for the first time, a dual-dimensional “service–application” classification framework, thereby filling a critical gap in comprehensive surveys at the FL–big data intersection. Key technical challenges are distilled, including model heterogeneity, communication overhead, security and robustness, and system scalability. International benchmark projects are synthesized, and viable technical pathways are proposed. The study provides researchers with a coherent theoretical taxonomy and offers practitioners a structured, implementation-oriented reference guide for deploying FL in big data ecosystems.
📝 Abstract
Big data has remarkably evolved over the last few years to realize an enormous volume of data generated from newly emerging services and applications and a massive number of Internet-of-Things (IoT) devices. The potential of big data can be realized via analytic and learning techniques, in which the data from various sources is transferred to a central cloud for central storage, processing, and training. However, this conventional approach faces critical issues in terms of data privacy as the data may include sensitive data such as personal information, governments, banking accounts. To overcome this challenge, federated learning (FL) appeared to be a promising learning technique. However, a gap exists in the literature that a comprehensive survey on FL for big data services and applications is yet to be conducted. In this article, we present a survey on the use of FL for big data services and applications, aiming to provide general readers with an overview of FL, big data, and the motivations behind the use of FL for big data. In particular, we extensively review the use of FL for key big data services, including big data acquisition, big data storage, big data analytics, and big data privacy preservation. Subsequently, we review the potential of FL for big data applications, such as smart city, smart healthcare, smart transportation, smart grid, and social media. Further, we summarize a number of important projects on FL-big data and discuss key challenges of this interesting topic along with several promising solutions and directions.