Deploying Large AI Models on Resource-Limited Devices with Split Federated Learning

📅 2025-04-12

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

To address the challenges of resource constraints, privacy sensitivity, and high communication overhead on edge devices, this paper proposes SFLAM—a quantized split federated fine-tuning framework. SFLAM is the first to deeply integrate split learning, model quantization, adaptive power control, and dynamic bandwidth allocation into a unified edge–cloud collaborative training paradigm. We theoretically model and characterize the fundamental latency–energy trade-off boundary in edge training. Experimental results demonstrate that, compared to baseline methods, SFLAM reduces memory footprint by 62%, cuts communication latency by 47%, improves energy efficiency by 3.1×, and scales to over one thousand edge devices. This work provides a systematic solution for lightweight, privacy-preserving, and energy-efficient large-model fine-tuning at the edge.

Technology Category

Application Category

📝 Abstract

Large Artificial Intelligence Models (LAMs) powered by massive datasets, extensive parameter scales, and extensive computational resources, leading to significant transformations across various industries. Yet, their practical deployment on resource-limited mobile edge devices is hindered by critical challenges such as data privacy, constrained resources, and high overhead costs. Addressing this gap, this paper proposes a novel framework, named Quantized Split Federated Fine-Tuning Large AI Model (SFLAM). By partitioning the training load between edge devices and servers using a split learning paradigm, SFLAM can facilitate the operation of large models on devices and significantly lowers the memory requirements on edge devices. Additionally, SFLAM incorporates quantization management, power control, and bandwidth allocation strategies to enhance training efficiency while concurrently reducing energy consumption and communication latency. A theoretical analysis exploring the latency-energy trade-off is presented, and the framework's efficacy is validated via comprehensive simulations. The findings indicate that SFLAM achieves superior performance in terms of learning efficiency and scalability compared to conventional methods, thereby providing a valuable approach for enabling advanced AI services in resource-constrained scenarios.

Problem

Research questions and friction points this paper is trying to address.

Enabling large AI models on resource-limited edge devices

Reducing memory and computational overhead for edge deployment

Optimizing energy and latency in federated learning systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Split learning paradigm for edge-server load partitioning

Quantization management to reduce memory requirements

Power and bandwidth optimization for efficiency

🔎 Similar Papers

AdaptSFL: Adaptive Split Federated Learning in Resource-constrained Edge Networks