Federated Learning over Hierarchical Wireless Networks: Training Latency Minimization via Submodel Partitioning

📅 2023-10-27

📈 Citations: 6

✨ Influential: 0

🤖 AI Summary

To address the challenge of training large models on resource-constrained edge devices in cloud-edge-end hierarchical wireless networks, this paper proposes Hierarchical Independent Submodel Training (HIST). In each round, the global model is dynamically partitioned into mutually exclusive submodels and distributed across heterogeneous client groups for parallel local training, substantially reducing computational, communication, and storage overheads. Key contributions include: (1) the first dynamic submodel partitioning and joint resource-performance optimization mechanism tailored for wireless environments; (2) integration of over-the-air computation (AirComp) to enhance edge aggregation efficiency; and (3) a rigorous theoretical convergence analysis framework. Experiments demonstrate that HIST significantly reduces training time and communication cost compared to conventional hierarchical federated learning, while maintaining comparable model accuracy. The AirComp-enhanced variant further lowers latency and exhibits high efficiency and robustness under realistic wireless channel conditions.

📝 Abstract

Hierarchical federated learning (HFL) has demonstrated promising scalability advantages over the traditional"star-topology"architecture-based federated learning (FL). However, HFL still imposes significant computation, communication, and storage burdens on the edge, especially when training a large-scale model over resource-constrained wireless devices. In this paper, we propose hierarchical independent submodel training (HIST), a new FL methodology that aims to address these issues in hierarchical cloud-edge-client networks. The key idea behind HIST is to divide the global model into disjoint partitions (or submodels) per round so that each group of clients (i.e., cells) is responsible for training only one partition of the model. We characterize the convergence behavior of HIST under mild assumptions, showing the impacts of several key attributes (e.g., submodel sizes, number of cells, edge and global aggregation frequencies) on the rate and stationarity gap. Building upon the theoretical results, we propose a submodel partitioning strategy to minimize the training latency depending on network resource availability and a target learning performance guarantee. We then demonstrate how HIST can be augmented with over-the-air computation (AirComp) to further enhance the efficiency of the model aggregation over the edge cells. Through numerical evaluations, we verify that HIST is able to save training time and communication costs by wide margins while achieving comparable accuracy as conventional HFL. Moreover, our experiments demonstrate that AirComp-assisted HIST provides further improvements in training latency.

Problem

Research questions and friction points this paper is trying to address.

Hierarchical Wireless Networks

Large Model Training

Resource-constrained Devices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Independent Submodel Training

over-the-air computation (AirComp)

submodel partitioning strategy

🔎 Similar Papers

No similar papers found.

Authors to Follow