🤖 AI Summary
This work addresses the challenge of efficiently fine-tuning large language models (LLMs) in resource-constrained, data-heterogeneous, and privacy-sensitive edge networks. The authors propose a novel framework that integrates split learning with hierarchical federated learning, featuring a three-tier LLM partitioning architecture to enable cloud-edge collaboration while only aggregating lightweight adapters to minimize communication overhead. To enhance model adaptability, they introduce task-agnostic, behavior-aware client clustering combined with semantic fingerprinting based on symmetric KL divergence. Furthermore, they design a computation-efficient sketching compression mechanism and a Semantic Subspace Orthogonal Perturbation (SS-OP) technique to simultaneously preserve privacy and reduce communication costs. Experimental results demonstrate that the proposed method significantly outperforms existing approaches across multiple NLP tasks, achieving state-of-the-art performance in convergence speed, robustness, and scalability.
📝 Abstract
Training large language models (LLMs) at the network edge faces fundamental challenges arising from device resource constraints, severe data heterogeneity, and heightened privacy risks. To address these, we propose ELSA (Efficient LLM-centric Split Aggregation), a novel framework that systematically integrates split learning (SL) and hierarchical federated learning (HFL) for distributed LLM fine-tuning over resource-constrained edge networks. ELSA introduces three key innovations. First, it employs a task-agnostic, behavior-aware client clustering mechanism that constructs semantic fingerprints using public probe inputs and symmetric KL divergence, further enhanced by prediction-consistency-based trust scoring and latency-aware edge assignment to jointly address data heterogeneity, client unreliability, and communication constraints. Second, it splits the LLM into three parts across clients and edge servers, with the cloud used only for adapter aggregation, enabling an effective balance between on-device computation cost and global convergence stability. Third, it incorporates a lightweight communication scheme based on computational sketches combined with semantic subspace orthogonal perturbation (SS-OP) to reduce communication overhead while mitigating privacy leakage during model exchanges. Experiments across diverse NLP tasks demonstrate that ELSA consistently outperforms state-of-the-art methods in terms of adaptability, convergence behavior, and robustness, establishing a scalable and privacy-aware solution for edge-side LLM fine-tuning under resource constraints.