🤖 AI Summary
This work addresses the challenge of fine-tuning large models on edge devices, where standard federated and split learning suffer from memory-intensive backpropagation, while purely zeroth-order methods, though memory-efficient, exhibit slow convergence and dimension-dependent performance. To overcome these limitations, we propose HO-SFL, a hybrid-order split federated learning framework that decouples optimization within a Lagrangian formulation: the server performs first-order updates, while clients employ zeroth-order optimization without backpropagation. Notably, HO-SFL achieves the first dimension-independent model aggregation in this setting, breaking the dimensional bottleneck inherent in zeroth-order optimization. Empirical results demonstrate that our method attains convergence speeds comparable to first-order approaches on both vision and language tasks, while substantially reducing client-side memory consumption and communication overhead.
📝 Abstract
Fine-tuning large models on edge devices is severely hindered by the memory-intensive backpropagation (BP) in standard frameworks like federated learning and split learning. While substituting BP with zeroth-order optimization can significantly reduce memory footprints, it typically suffers from prohibitively degraded convergence speed. To resolve this dilemma, we propose Hybrid-Order Split Federated Learning (HO-SFL). By reformulating the split learning process within a Lagrangian framework, HO-SFL decouples the optimization landscape: The server performs precise first-order updates (i.e., BP), whereas clients conduct memory-efficient zeroth-order optimization. This hybrid design not only eliminates the need for client-side BP but also enables dimension-free model aggregation, drastically lowering communication costs. Crucially, we provide a theoretical convergence analysis, demonstrating that HO-SFL mitigates the dimension-dependent convergence slowdown of zeroth-order optimization, achieving a convergence rate comparable to first-order methods. Extensive experiments on tasks across vision and language modalities validate that HO-SFL achieves convergence speeds comparable to first-order baselines while significantly reducing communication costs and client memory footprints.