ProFed: a Benchmark for Proximity-based non-IID Federated Learning

📅 2025-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing federated learning (FL) algorithms suffer significant performance degradation under spatially induced non-IID data—such as regional dialects or localized traffic patterns—yet mainstream FL evaluation relies predominantly on random non-IID partitioning, neglecting inherent geographic structure. To address this gap, we propose GeoFL, the first FL benchmark explicitly designed for geography-driven non-IID settings. GeoFL innovatively incorporates spatial distance modeling into FL benchmark design, enabling multi-granularity simulation of regional data skew and standardized evaluation across diverse datasets—from MNIST to CIFAR-100. Building upon pathological and quantity-based partitioning schemes, it extends spatial clustering strategies and provides a fully reproducible data partitioning toolkit. Empirical validation demonstrates that GeoFL substantially enhances the comparability and generalizability assessment of FL algorithms in real-world edge urban computing scenarios.

Technology Category

Application Category

📝 Abstract
In recent years, cro:flFederated learning (FL) has gained significant attention within the machine learning community. Although various FL algorithms have been proposed in the literature, their performance often degrades when data across clients is non-independently and identically distributed (non-IID). This skewness in data distribution often emerges from geographic patterns, with notable examples including regional linguistic variations in text data or localized traffic patterns in urban environments. Such scenarios result in IID data within specific regions but non-IID data across regions. However, existing FL algorithms are typically evaluated by randomly splitting non-IID data across devices, disregarding their spatial distribution. To address this gap, we introduce ProFed, a benchmark that simulates data splits with varying degrees of skewness across different regions. We incorporate several skewness methods from the literature and apply them to well-known datasets, including MNIST, FashionMNIST, CIFAR-10, and CIFAR-100. Our goal is to provide researchers with a standardized framework to evaluate FL algorithms more effectively and consistently against established baselines.
Problem

Research questions and friction points this paper is trying to address.

Evaluating FL algorithms under spatially skewed non-IID data
Addressing performance degradation in regionally distributed FL systems
Providing a benchmark for standardized FL algorithm comparison
Innovation

Methods, ideas, or system contributions that make the work stand out.

ProFed benchmark for non-IID FL
Simulates region-based data skewness
Standardized evaluation for FL algorithms
🔎 Similar Papers
No similar papers found.