A Tabular Schedule Abstraction for Communication-Aware Evaluation of Pipeline-Parallel LLM Training

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing evaluations of pipeline parallelism scheduling strategies for large language models are limited by analytical models that neglect communication overhead and costly end-to-end experiments. This work proposes a unified evaluation framework that integrates formal modeling, tabular scheduling abstractions, and communication-aware execution simulation, enabling—for the first time—joint modeling of structured schedule representations and communication costs. Using this framework, we systematically compare GPipe, 1F1B, Chimera, and Hanayo across diverse hardware configurations, revealing that scheduling efficacy is highly dependent on the execution environment and challenging the conventional paradigm of relying solely on structural metrics such as bubble ratio. Our experiments show that GPipe and 1F1B yield similar training times, though 1F1B uses less activation memory; Chimera is advantageous only with few microbatches or highly efficient communication; and Hanayo performs well within its applicable scenarios but is sensitive to network bottlenecks.

📝 Abstract

Pipeline parallelism is a key technique for distributed training of large language models because it reduces per-device parameter and activation memory. However, comparing pipeline schedules is difficult: analytical models expose structural quantities such as bubble ratios, while end-to-end hardware experiments are costly and system-specific. In this work, we introduce a tabular schedule abstraction and a unified multi-abstraction methodology that connects formula-based reasoning, idealized schedule tables, and communication-aware execution simulation. Using this framework, we compare GPipe, 1F1B, Chimera, and Hanayo in its restricted regime across multiple modeled system configurations. Our results show that schedule rankings are not abstraction-invariant: communication can negate structural advantages suggested by bubble analysis alone. Under the assumptions considered here, GPipe and 1F1B are runtime-equivalent, but 1F1B achieves a lower activation-memory peak. Chimera is advantageous mainly at low microbatch counts and in communication-favorable regimes, while Hanayo is effective in its intended restricted operating point but remains sensitive to network bottlenecks. We further study an asymmetric Chimera-style placement, which does not reduce the global peak memory requirement but reveals limited runtime gains in shallow pipelines. Overall, pipeline schedule quality is meaningful only in the context of the modeled execution environment.

Problem

Research questions and friction points this paper is trying to address.

pipeline parallelism

schedule evaluation

communication-aware

LLM training

tabular abstraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

tabular schedule abstraction

pipeline parallelism

communication-aware simulation