How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Traditional neural scaling laws exhibit diminishing applicability in emerging architectures—including sparse models, Mixture-of-Experts (MoE), multimodal systems, and retrieval-augmented models—due to heterogeneity across modalities and stringent deployment constraints. Method: This work systematically reviews the theoretical foundations and empirical boundaries of scaling laws, synthesizing insights from over 50 studies. We propose an adaptive scaling framework that jointly optimizes data efficiency, inference cost, and architectural constraints, integrating power-law modeling, cross-modal performance attribution, and architecture-sensitivity analysis. Contribution/Results: We distill transferable, practical scaling guidelines that precisely delineate the validity domains and failure modes of scaling laws. The framework provides principled theoretical support and actionable decision-making tools for efficient large-model scaling; its components have been adopted by multiple industrial-scale training frameworks.

Technology Category

Application Category

📝 Abstract

Neural scaling laws have revolutionized the design and optimization of large-scale AI models by revealing predictable relationships between model size, dataset volume, and computational resources. Early research established power-law relationships in model performance, leading to compute-optimal scaling strategies. However, recent studies highlighted their limitations across architectures, modalities, and deployment contexts. Sparse models, mixture-of-experts, retrieval-augmented learning, and multimodal models often deviate from traditional scaling patterns. Moreover, scaling behaviors vary across domains such as vision, reinforcement learning, and fine-tuning, underscoring the need for more nuanced approaches. In this survey, we synthesize insights from over 50 studies, examining the theoretical foundations, empirical findings, and practical implications of scaling laws. We also explore key challenges, including data efficiency, inference scaling, and architecture-specific constraints, advocating for adaptive scaling strategies tailored to real-world applications. We suggest that while scaling laws provide a useful guide, they do not always generalize across all architectures and training strategies.

Problem

Research questions and friction points this paper is trying to address.

Upscaling neural networks efficiently

Adapting scaling laws across architectures

Addressing domain-specific scaling challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural scaling laws optimization

Sparse models deviation

Adaptive scaling strategies

🔎 Similar Papers

No similar papers found.