🤖 AI Summary
To address high training latency and slow convergence in federated learning on resource-constrained edge devices—caused by device heterogeneity, tight memory/energy budgets, and limited communication bandwidth—this paper proposes FTTE, a semi-asynchronous federated learning framework. FTTE’s core innovations are: (i) the first joint modeling of update staleness and local gradient variance to design a dynamic staleness-weighted aggregation mechanism; and (ii) sparse parameter updates to reduce both communication overhead and memory footprint. Extensive experiments under challenging conditions—500 clients with 90% high-latency nodes—demonstrate that FTTE achieves an 81% training speedup over FedAvg, reduces memory consumption by 80%, cuts communication volume by 69%, and maintains or improves model accuracy. These results significantly enhance training efficiency and scalability in heterogeneous edge environments.
📝 Abstract
Federated learning (FL) enables collaborative model training across distributed devices while preserving data privacy, but deployment on resource-constrained edge nodes remains challenging due to limited memory, energy, and communication bandwidth. Traditional synchronous and asynchronous FL approaches further suffer from straggler induced delays and slow convergence in heterogeneous, large scale networks. We present FTTE (Federated Tiny Training Engine),a novel semi-asynchronous FL framework that uniquely employs sparse parameter updates and a staleness-weighted aggregation based on both age and variance of client updates. Extensive experiments across diverse models and data distributions - including up to 500 clients and 90% stragglers - demonstrate that FTTE not only achieves 81% faster convergence, 80% lower on-device memory usage, and 69% communication payload reduction than synchronous FL (eg.FedAVG), but also consistently reaches comparable or higher target accuracy than semi-asynchronous (eg.FedBuff) in challenging regimes. These results establish FTTE as the first practical and scalable solution for real-world FL deployments on heterogeneous and predominantly resource-constrained edge devices.