π€ AI Summary
To address poor scalability, high server overhead, and degraded model performance and convergence in split learning for distributed training, this paper proposes CycleSLβa cyclic split learning framework that eliminates model aggregation. Its core innovations include: (i) modeling server-side training as an independent high-level optimization task; (ii) designing a server-first, client-following periodic gradient update mechanism; and (iii) re-sampling client features based on the alternating block coordinate descent principle. By avoiding conventional model aggregation and replication, CycleSL significantly reduces communication and computational overhead while inherently mitigating data heterogeneity and client drift. Extensive experiments across five non-IID datasets demonstrate that CycleSL consistently outperforms state-of-the-art methods in classification accuracy, convergence speed, and robustness under partial client participation.
π Abstract
Split learning emerges as a promising paradigm for collaborative distributed model training, akin to federated learning, by partitioning neural networks between clients and a server without raw data exchange. However, sequential split learning suffers from poor scalability, while parallel variants like parallel split learning and split federated learning often incur high server resource overhead due to model duplication and aggregation, and generally exhibit reduced model performance and convergence owing to factors like client drift and lag. To address these limitations, we introduce CycleSL, a novel aggregation-free split learning framework that enhances scalability and performance and can be seamlessly integrated with existing methods. Inspired by alternating block coordinate descent, CycleSL treats server-side training as an independent higher-level machine learning task, resampling client-extracted features (smashed data) to mitigate heterogeneity and drift. It then performs cyclical updates, namely optimizing the server model first, followed by client updates using the updated server for gradient computation. We integrate CycleSL into previous algorithms and benchmark them on five publicly available datasets with non-iid data distribution and partial client attendance. Our empirical findings highlight the effectiveness of CycleSL in enhancing model performance. Our source code is available at https://gitlab.lrz.de/hctl/CycleSL.