Theoretical Foundations of Scaling Law in Familial Models

πŸ“… 2025-12-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing neural scaling laws apply only to monolithic dense models and fail to characterize β€œfamily models”—a paradigm comprising multiple submodels derived from a shared backbone, supporting early exit and relay-style inference, and enabling heterogeneous deployment across devices, edge, and cloud. Method: We introduce granularity (G) as the third fundamental scaling variable alongside model size (N) and training token count (D), establishing a unified scaling law L(N, D, G). Using IsoFLOP experimental design and multivariate parameterization, we empirically identify a multiplicative power-law granularity penalty with an extremely small exponent. Results: We demonstrate that N, G, and D are decoupled and independently scalable under fixed compute budgets. Family models thus preserve computational optimality while substantially enhancing deployment flexibility. This work provides the first empirically validated scaling theory framework for ubiquitous intelligence.

Technology Category

Application Category

πŸ“ Abstract
Neural scaling laws have become foundational for optimizing large language model (LLM) training, yet they typically assume a single dense model output. This limitation effectively overlooks "Familial models, a transformative paradigm essential for realizing ubiquitous intelligence across heterogeneous device-edge-cloud hierarchies. Transcending static architectures, familial models integrate early exits with relay-style inference to spawn G deployable sub-models from a single shared backbone. In this work, we theoretically and empirically extend the scaling law to capture this "one-run, many-models" paradigm by introducing Granularity (G) as a fundamental scaling variable alongside model size (N) and training tokens (D). To rigorously quantify this relationship, we propose a unified functional form L(N, D, G) and parameterize it using large-scale empirical runs. Specifically, we employ a rigorous IsoFLOP experimental design to strictly isolate architectural impact from computational scale. Across fixed budgets, we systematically sweep model sizes (N) and granularities (G) while dynamically adjusting tokens (D). This approach effectively decouples the marginal cost of granularity from the benefits of scale, ensuring high-fidelity parameterization of our unified scaling law. Our results reveal that the granularity penalty follows a multiplicative power law with an extremely small exponent. Theoretically, this bridges fixed-compute training with dynamic architectures. Practically, it validates the "train once, deploy many" paradigm, demonstrating that deployment flexibility is achievable without compromising the compute-optimality of dense baselines.
Problem

Research questions and friction points this paper is trying to address.

Extend scaling laws to familial models with granularity as a variable
Quantify the relationship between model size, tokens, and granularity
Validate train-once-deploy-many paradigm without sacrificing compute efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces granularity G as a scaling variable alongside N and D
Employs IsoFLOP design to isolate architectural impact from compute
Reveals granularity penalty follows multiplicative power law with small exponent
πŸ”Ž Similar Papers
No similar papers found.
Huan Song
Huan Song
Amazon AWS AI
Deep learningmachine learninggraph neural networkstime-series analysis
Qingfei Zhao
Qingfei Zhao
University of the Chinese Academy of Sciences
Natural Language ProcessingArtificial Intelligence
T
Ting Long
Institute of Artificial Intelligence (TeleAI), China Telecom
S
Shuyu Tian
Institute of Artificial Intelligence (TeleAI), China Telecom
H
Hongjun An
Institute of Artificial Intelligence (TeleAI), China Telecom
J
Jiawei Shao
Institute of Artificial Intelligence (TeleAI), China Telecom
C
Chi Zhang
Institute of Artificial Intelligence (TeleAI), China Telecom
X
Xuelong Li
Institute of Artificial Intelligence (TeleAI), China Telecom