🤖 AI Summary
This work addresses the limitations of existing GUI agent training, which relies on centralized data—leading to high costs and poor scalability—and lacks a federated learning benchmark that supports cross-platform heterogeneity. To bridge this gap, we propose the first federated evaluation benchmark for GUI agents spanning mobile, web, and desktop platforms, encompassing six datasets and systematically modeling heterogeneity across platforms, devices, operating systems, and data sources. By establishing a standardized federated learning evaluation framework, our approach enables, for the first time, collaborative training across diverse GUI environments. Empirical results reveal that platform and operating system are key factors influencing performance, and demonstrate that cross-platform collaboration significantly outperforms single-platform training. The code and datasets are publicly released to support future research on privacy-preserving, scalable GUI agents in real-world settings.
📝 Abstract
Training GUI agents with traditional centralized methods faces significant cost and scalability challenges. Federated learning (FL) offers a promising solution, yet its potential is hindered by the lack of benchmarks that capture real-world, cross-platform heterogeneity. To bridge this gap, we introduce FedGUI, the first comprehensive benchmark for developing and evaluating federated GUI agents across mobile, web, and desktop platforms. FedGUI provides a suite of six curated datasets to systematically study four crucial types of heterogeneity: cross-platform, cross-device, cross-OS, and cross-source. Extensive experiments reveal several key insights: First, we show that cross-platform collaboration improves performance, extending prior mobile-only federated learning to diverse GUI environments; Second, we demonstrate the presence of distinct heterogeneity dimensions and identify platform and OS as the most influential factors. FedGUI provides a vital foundation for the community to build more scalable and privacy-preserving GUI agents for real-world deployment. Our code and data are publicly available at https://github.com/wwh0411/FedGUI..