It Takes a Good Model to Train a Good Model: Generalized Gaussian Priors for Optimized LLMs

📅 2025-05-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Long-standing neglect of the statistical characteristics of large language model (LLM) parameters has hindered efficient compression and deployment. Method: This paper proposes the first end-to-end co-optimization framework grounded in the Generalized Gaussian Distribution (GGD), comprising three stages: (i) GGD-driven statistical-aware initialization, (ii) distribution-alignment regularization training (BackSlash), and (iii) post-training weight shaping (DeepShape) coupled with RF8—a hardware-friendly 8-bit floating-point quantization format. Contribution/Results: The framework ensures parameter distribution consistency across training, compression, and deployment. Experiments on diverse LLM architectures achieve up to 90% parameter compression, 23% faster training convergence, and 31% lower inference latency, with <0.5% accuracy degradation on Winogrande and ARC benchmarks—substantially enhancing edge-deployment feasibility.

Technology Category

Application Category

📝 Abstract
Despite rapid advancements in the research and deployment of large language models (LLMs), the statistical distribution of model parameters, as well as their influence on initialization, training dynamics, and downstream efficiency, has received surprisingly little attention. A recent work introduced BackSlash, a training-time compression algorithm. It first demonstrated that pre-trained LLM parameters follow generalized Gaussian distributions (GGDs) better. By optimizing GG priors during training, BackSlash can reduce parameters by up to 90% with minimal performance loss. Building on this foundational insight, we propose a unified, end-to-end framework for LLM optimization based on the GG model. Our contributions are threefold: (1) GG-based initialization scheme that aligns with the statistical structure of trained models, resulting in faster convergence and improved accuracy; (2) DeepShape, a post-training regularization method that reshapes weight distributions to match a GG profile, improving compressibility with minimized degradation in performance; and (3) RF8, a compact and hardware-efficient 8-bit floating-point format designed for GG-distributed-initialized BackSlash training, enabling low-cost inference without compromising accuracy. Experiments across diverse model architectures show that our framework consistently yields smaller and faster models that match or outperform standard training baselines. By grounding LLM development in principled statistical modeling, this work forges a new path toward efficient, scalable, and hardware-aware AI systems. The code is available on our project page: https://huggingface.co/spaces/shifeng3711/gg_prior.
Problem

Research questions and friction points this paper is trying to address.

Optimizing LLM parameters using generalized Gaussian distributions
Improving model initialization and training with GG priors
Enhancing compressibility and hardware efficiency in LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

GG-based initialization for faster convergence
DeepShape reshapes weights for better compressibility
RF8 8-bit format enables low-cost inference
🔎 Similar Papers
No similar papers found.
J
Jun Wu
Shenzhen International Graduate School, Tsinghua University
Y
Yirong Xiong
Shenzhen International Graduate School, Tsinghua University
Jiangtao Wen
Jiangtao Wen
NYU
Yuxing Han
Yuxing Han
Tsinghua University
Smart AgricultureArtificial IntelligenceVideoCommunication