TimelyFreeze: Adaptive Parameter Freezing Mechanism for Pipeline Parallelism

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the degradation in training throughput and model accuracy caused by suboptimal parameter freezing strategies in pipeline-parallel training. It presents the first unified formulation that jointly models pipeline scheduling and precision constraints: computational dependencies are represented via a directed acyclic graph, and an optimization problem is formulated as a linear program with explicit accuracy constraints. This framework dynamically determines the optimal proportion of frozen parameters to minimize per-batch execution time. The resulting approach enables adaptive, accuracy-aware freezing policies that achieve up to a 40% improvement in training throughput on the LLaMA-8B model while preserving model accuracy, and it generalizes effectively across diverse pipeline-parallel configurations.

Technology Category

Application Category

📝 Abstract
Pipeline parallelism enables training models that exceed single-device memory, but practical throughput remains limited by pipeline bubbles. Although parameter freezing can improve training throughput by adaptively skipping backward computation, existing methods often over-freeze parameters, resulting in unnecessary accuracy degradation. To address this issue, we propose TimelyFreeze, which models the pipeline schedule as a directed acyclic graph and solves a linear program to compute optimal freeze ratios that minimize batch execution time under accuracy constraints. Experiments show that TimelyFreeze achieves up to 40% training throughput improvement on LLaMA-8B with comparable accuracy. Overall, it enables faster large-scale model training without compromising convergence and generalizes across diverse pipeline-parallel settings.
Problem

Research questions and friction points this paper is trying to address.

pipeline parallelism
parameter freezing
training throughput
accuracy degradation
pipeline bubbles
Innovation

Methods, ideas, or system contributions that make the work stand out.

pipeline parallelism
parameter freezing
adaptive optimization
linear programming
training throughput
🔎 Similar Papers
No similar papers found.
S
Seonghye Cho
School of Computing, KAIST, Daejeon, South Korea
J
Jaemin Han
School of Computing, KAIST, Daejeon, South Korea
Hyunjin Kim
Hyunjin Kim
KAIST
Computer Vision
E
Euisoo Jung
School of Computing, KAIST, Daejeon, South Korea
Jae-Gil Lee
Jae-Gil Lee
Professor, School of Computing, KAIST
big datadata mining