HOSL: Hybrid-Order Split Learning for Memory-Constrained Edge Training

📅 2026-01-16

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the challenge of training large language models on resource-constrained edge devices, where conventional split learning relies on first-order optimization and incurs substantial memory overhead due to intermediate activation storage, while purely zeroth-order methods, though memory-efficient, suffer from slow convergence and degraded performance. To bridge this gap, the authors propose HOSL, a hybrid-order split learning framework that uniquely integrates zeroth- and first-order optimization: clients employ zeroth-order updates to eliminate backpropagation and activation storage, while the server leverages first-order optimization to ensure fast convergence and high model accuracy. Theoretical analysis shows that the convergence rate depends only on the client-side model dimension. Experiments on OPT models demonstrate that HOSL reduces client GPU memory usage by up to 3.7×, with accuracy losses of merely 0.20%–4.23% compared to first-order baselines, and outperforms pure zeroth-order methods by up to 15.55%.

Technology Category

Application Category

📝 Abstract

Split learning (SL) enables collaborative training of large language models (LLMs) between resource-constrained edge devices and compute-rich servers by partitioning model computation across the network boundary. However, existing SL systems predominantly rely on first-order (FO) optimization, which requires clients to store intermediate quantities such as activations for backpropagation. This results in substantial memory overhead, largely negating benefits of model partitioning. In contrast, zeroth-order (ZO) optimization eliminates backpropagation and significantly reduces memory usage, but often suffers from slow convergence and degraded performance. In this work, we propose HOSL, a novel Hybrid-Order Split Learning framework that addresses this fundamental trade-off between memory efficiency and optimization effectiveness by strategically integrating ZO optimization on the client side with FO optimization on the server side. By employing memory-efficient ZO gradient estimation at the client, HOSL eliminates backpropagation and activation storage, reducing client memory consumption. Meanwhile, server-side FO optimization ensures fast convergence and competitive performance. Theoretically, we show that HOSL achieves an $\mathcal{O}(\sqrt{d_c/TQ})$ rate, which depends on client-side model dimension $d_c$ rather than the full model dimension $d$, demonstrating that convergence improves as more computation is offloaded to the server. Extensive experiments on OPT models (125M and 1.3B parameters) across 6 tasks demonstrate that HOSL reduces client GPU memory by up to 3.7$\times$ compared to the FO method while achieving accuracy within 0.20%-4.23% of this baseline. Furthermore, HOSL outperforms the ZO baseline by up to 15.55%, validating the effectiveness of our hybrid strategy for memory-efficient training on edge devices.

Problem

Research questions and friction points this paper is trying to address.

Split Learning

Memory-Constrained Edge Training

Large Language Models

Optimization Efficiency

Client-Server Collaboration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Split Learning

Zeroth-Order Optimization

Memory-Efficient Training