🤖 AI Summary
This paper addresses the dual challenges of privacy preservation and statistical heterogeneity in commercial image scenarios under federated learning. To this end, we introduce the first benchmark dataset for federated commercial image classification—comprising 23,326 images from eight real-world sources across 31 fine-grained classes. We further propose two novel topology-aware algorithms: Fed-Cyclic (employing cyclic communication) and Fed-Star (leveraging star-shaped aggregation), both integrating local training, weight propagation, and pre-aggregation to jointly mitigate non-IID data distributions. Extensive experiments demonstrate that both methods significantly outperform mainstream baselines (e.g., FedAvg, FedProx) on the proposed benchmark, validating their effectiveness and generalizability. Our key contributions are threefold: (1) the first multi-source, real-world commercial image federated learning benchmark; (2) two lightweight, topology-optimized federated optimization algorithms; and (3) a systematic modeling framework for statistical heterogeneity in practical deployment settings.
📝 Abstract
Federated Learning is a collaborative machine learning paradigm that enables multiple clients to learn a global model without exposing their data to each other. Consequently, it provides a secure learning platform with privacy-preserving capabilities. This paper introduces a new dataset containing 23,326 images collected from eight different commercial sources and classified into 31 categories, similar to the Office-31 dataset. To the best of our knowledge, this is the first image classification dataset specifically designed for Federated Learning. We also propose two new Federated Learning algorithms, namely Fed-Cyclic and Fed-Star. In Fed-Cyclic, a client receives weights from its previous client, updates them through local training, and passes them to the next client, thus forming a cyclic topology. In Fed-Star, a client receives weights from all other clients, updates its local weights through pre-aggregation (to address statistical heterogeneity) and local training, and sends its updated local weights to all other clients, thus forming a star-like topology. Our experiments reveal that both algorithms perform better than existing baselines on our newly introduced dataset.