Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design Guide

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

This study systematically investigates hybrid parallelism strategies for large language models during both training and inference, aiming to balance computational, communication, and memory overheads. By constructing a mathematical cost model grounded in collective communication operations and integrating communication-computation overlap with automated strategy search, the work proposes a hybrid parallelism framework that achieves both efficiency and scalability. It is the first to unify theoretical modeling, automated search, and empirical evaluation across multiple hardware architectures, revealing the trade-offs among different parallelization strategies in training versus inference. The resulting framework provides reusable deployment guidelines for canonical model architectures, significantly enhancing distributed efficiency.

Technology Category

Application Category

📝 Abstract

With the rapid growth of large language models (LLMs), a wide range of methods have been developed to distribute computation and memory across hardware devices for efficient training and inference. While existing surveys provide descriptive overviews of these techniques, systematic analysis of their benefits and trade offs and how such insights can inform principled methodology for designing optimal distributed systems remain limited. This paper offers a comprehensive review of collective operations and distributed parallel strategies, complemented by mathematical formulations to deepen theoretical understanding. We further examine hybrid parallelization designs, emphasizing communication computation overlap across different stages of model deployment, including both training and inference. Recent advances in automated search for optimal hybrid parallelization strategies using cost models are also discussed. Moreover, we present case studies with mainstream architecture categories to reveal empirical insights to guide researchers and practitioners in parallelism strategy selection. Finally, we highlight open challenges and limitations of current LLM training paradigms and outline promising directions for the next generation of large scale model development.

Problem

Research questions and friction points this paper is trying to address.

distributed parallelism

large language models

hybrid parallelization

system design

model deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed Hybrid Parallelism

Large Language Models

Collective Operations