🤖 AI Summary
To address the slow convergence and lack of theoretical guarantees in decentralized LoRA training—caused by gradient nonsmoothness and model consensus interference—this paper establishes the first convergence theory for such settings, proving that DeCAF achieves the optimal convergence rate matching distributed SGD. DeCAF integrates truncated SVD-based low-rank updates with explicit consensus constraints, effectively eliminating consensus interference while preserving communication and computational efficiency. Our theoretical analysis rigorously characterizes how low-rank parameter coupling affects convergence under nonsmooth objectives. Experiments demonstrate that DeCAF significantly outperforms local training on vision and language tasks, matches federated learning performance under both IID and non-IID data distributions, and exhibits stable convergence and excellent scalability.
📝 Abstract
Low-Rank Adaptation (LoRA) has emerged as one of the most effective, computationally tractable fine-tuning approaches for training Vision-Language Models (VLMs) and Large Language Models (LLMs). LoRA accomplishes this by freezing the pre-trained model weights and injecting trainable low-rank matrices, allowing for efficient learning of these foundation models even on edge devices. However, LoRA in decentralized settings still remains under explored, particularly for the theoretical underpinnings due to the lack of smoothness guarantee and model consensus interference (defined formally below). This work improves the convergence rate of decentralized LoRA (DLoRA) to match the rate of decentralized SGD by ensuring gradient smoothness. We also introduce DeCAF, a novel algorithm integrating DLoRA with truncated singular value decomposition (TSVD)-based matrix factorization to resolve consensus interference. Theoretical analysis shows TSVD's approximation error is bounded and consensus differences between DLoRA and DeCAF vanish as rank increases, yielding DeCAF's matching convergence rate. Extensive experiments across vision/language tasks demonstrate our algorithms outperform local training and rivals federated learning under both IID and non-IID data distributions.