🤖 AI Summary
This work addresses the challenge of efficiently and privately fine-tuning large language models on resource-constrained mobile devices, where high memory and computational overheads hinder existing approaches. Current federated learning methods either rely on costly backpropagation or employ zeroth-order optimization (ZOO), which suffers from slow convergence and low accuracy. To overcome these limitations, we propose CooperLLM, a novel framework that uniquely integrates cloud-side gradient guidance with edge-side ZOO: mobile devices perform lightweight local updates via ZOO, while the cloud leverages public data to generate guiding perturbations through backpropagation, effectively correcting local gradients to accelerate convergence and improve accuracy. Additionally, pipeline scheduling and adaptive compression techniques are introduced to alleviate system bottlenecks. Experiments show that, compared to state-of-the-art ZOO methods, CooperLLM reduces device memory usage by up to 86.4%, achieves up to 8.8× faster convergence, and improves accuracy by as much as 10 percentage points.
📝 Abstract
Large Language Models (LLMs) perform well on many NLP tasks, but fine-tuning them on resource-constrained mobile devices is challenging due to high memory and computation costs, despite growing demands for privacy-preserving personalization. Federated Learning (FL) enables local-data training, yet existing methods either rely on memory-intensive backpropagation or use zeroth-order optimization (ZOO), which avoids backward passes but suffers from slow convergence and degraded accuracy. We propose CooperLLM, a cloud-assisted edge-end cooperative federated fine-tuning framework that combines ZOO on mobile devices with cloud-guided gradient rectification. Mobile clients perform lightweight ZOO updates on private data, while the cloud fine-tunes on auxiliary public data using backpropagation and injects guided perturbations to rectify local updates, improving convergence and accuracy without violating privacy. To address system bottlenecks, CooperLLM introduces pipeline scheduling and adaptive compression to overlap computation and communication and reduce memory usage. Experiments on multiple Transformer models and datasets show that CooperLLM reduces on-device memory by up to $86.4\%$, accelerates convergence by $8.8 \times$, and improves accuracy by up to 10 percentage points over state-of-the-art ZOO-based baselines.