🤖 AI Summary
Federated learning (FL) faces inherent risks of local data privacy leakage, while existing FL benchmarks lack domain heterogeneity and client isolation mechanisms, hindering rigorous evaluation. To address this, we propose ModelNet—the first privacy-preserving, graph-structured FL benchmark. ModelNet constructs three non-IID client distributions (homogeneous, heterogeneous, and random) by extracting CIFAR-100 embeddings via ResNet50. It introduces the first client-specific, graph-compatible FL benchmarking framework and a novel model-parameter-level anonymization paradigm that enables domain-shift modeling without compromising privacy. Additionally, it provides built-in interfaces for Graph Neural Networks (GNNs) to support cross-environment evaluation. We release three dataset variants—ModelNet-S, ModelNet-D, and ModelNet-R—alongside open-source code. Experiments demonstrate that ModelNet significantly outperforms existing benchmarks in evaluating robustness to domain shift and in comparative analysis of aggregation strategies.
📝 Abstract
Federated Learning (FL) has emerged as a powerful paradigm for training machine learning models across distributed data sources while preserving data locality. However, the privacy of local data is always a pivotal concern and has received a lot of attention in recent research on the FL regime. Moreover, the lack of domain heterogeneity and client-specific segregation in the benchmarks remains a critical bottleneck for rigorous evaluation. In this paper, we introduce ModelNet, a novel image classification dataset constructed from the embeddings extracted from a pre-trained ResNet50 model. First, we modify the CIFAR100 dataset into three client-specific variants, considering three domain heterogeneities (homogeneous, heterogeneous, and random). Subsequently, we train each client-specific subset of all three variants on the pre-trained ResNet50 model to save model parameters. In addition to multi-domain image data, we propose a new hypothesis to define the FL algorithm that can access the anonymized model parameters to preserve the local privacy in a more effective manner compared to existing ones. ModelNet is designed to simulate realistic FL settings by incorporating non-IID data distributions and client diversity design principles in the mainframe for both conventional and futuristic graph-driven FL algorithms. The three variants are ModelNet-S, ModelNet-D, and ModelNet-R, which are based on homogeneous, heterogeneous, and random data settings, respectively. To the best of our knowledge, we are the first to propose a cross-environment client-specific FL dataset along with the graph-based variant. Extensive experiments based on domain shifts and aggregation strategies show the effectiveness of the above variants, making it a practical benchmark for classical and graph-based FL research. The dataset and related code are available online.