🤖 AI Summary
This work addresses the challenges of on-device federated fine-tuning of large language models, which are constrained by limited computation and memory resources as well as high communication overhead. To mitigate these issues, the authors propose SplitCom, a framework that partitions the model between client and server and introduces a temporal activation compression mechanism inspired by video compression, uploading activations only when significant changes occur. Communication costs are further reduced through adaptive threshold control—implemented via either Bang-Bang control or deep deterministic policy gradient (DDPG) reinforcement learning—and dimensionality reduction techniques. The framework is extended into a U-shaped architecture to preserve label privacy. Experiments demonstrate that SplitCom reduces uplink communication costs by 98.6% under standard settings, while the U-shaped variant achieves a 95.8% reduction in total communication cost, all without compromising model performance.
📝 Abstract
Federated fine-tuning of on-device large language models (LLMs) mitigates privacy concerns by preventing raw data sharing. However, the intensive computational and memory demands pose significant challenges for resource-constrained edge devices. To overcome these limitations, split federated learning (SFL) emerges as a promising solution that partitions the model into lightweight client-side and compute-intensive server-side sub-models, thus offloading the primary training workload to a powerful server. Nevertheless, high-dimensional activation exchanges in SFL lead to excessive communication overhead. To overcome this, we propose SplitCom, a communication-efficient SFL framework for LLMs that exploits temporal redundancy in activations across consecutive training epochs. Inspired by video compression, the core innovation of our framework lies in selective activation uploading only when a noticeable deviation from previous epochs occurs. To balance communication efficiency and learning performance, we introduce two adaptive threshold control schemes based on 1) bang-bang control or 2) deep deterministic policy gradient (DDPG)-based reinforcement learning. Moreover, we implement dimensionality reduction techniques to alleviate client-side memory requirements. Furthermore, we extend SplitCom to the U-shape architecture, ensuring the server never accesses clients'labels. Extensive simulations and laboratory experiments demonstrate that SplitCom reduces uplink communication costs by up to 98.6\,\% in its standard configuration and total communication costs by up to 95.8\,\% in its U-shape variant without noticeably compromising model performance.