Beyond the Black Box: Theory and Mechanism of Large Language Models

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 3
Influential: 1
📄 PDF
🤖 AI Summary
While large language models have demonstrated remarkable engineering success, they remain theoretically underdeveloped and mechanistically opaque—essentially operating as “black boxes.” This work proposes the first unified theoretical framework encompassing the entire lifecycle of large language models, systematically analyzing the core mechanisms across six stages: data preparation, model construction, training, alignment, inference, and evaluation. By integrating information theory, optimization theory, and representation learning, the framework elucidates the mathematical principles underlying critical issues such as data mixing strategies, architectural expressivity, and alignment optimization. Furthermore, it identifies forward-looking challenges including self-improving synthetic data generation, safety boundaries, and the origins of emergent intelligence. This study provides a structured roadmap toward transforming large language models from empirical engineering artifacts into an explainable, predictable, and verifiable scientific discipline.

Technology Category

Application Category

📝 Abstract
The rapid emergence of Large Language Models (LLMs) has precipitated a profound paradigm shift in Artificial Intelligence, delivering monumental engineering successes that increasingly impact modern society. However, a critical paradox persists within the current field: despite the empirical efficacy, our theoretical understanding of LLMs remains disproportionately nascent, forcing these systems to be treated largely as ``black boxes''. To address this theoretical fragmentation, this survey proposes a unified lifecycle-based taxonomy that organizes the research landscape into six distinct stages: Data Preparation, Model Preparation, Training, Alignment, Inference, and Evaluation. Within this framework, we provide a systematic review of the foundational theories and internal mechanisms driving LLM performance. Specifically, we analyze core theoretical issues such as the mathematical justification for data mixtures, the representational limits of various architectures, and the optimization dynamics of alignment algorithms. Moving beyond current best practices, we identify critical frontier challenges, including the theoretical limits of synthetic data self-improvement, the mathematical bounds of safety guarantees, and the mechanistic origins of emergent intelligence. By connecting empirical observations with rigorous scientific inquiry, this work provides a structured roadmap for transitioning LLM development from engineering heuristics toward a principled scientific discipline.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
theoretical understanding
black box
emergent intelligence
mechanistic interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

large language models
theoretical foundations
lifecycle taxonomy
emergent intelligence
alignment theory
🔎 Similar Papers
No similar papers found.
Z
Zeyu Gan
Gaoling School of Artificial Intelligence, Renmin University of China
Ruifeng Ren
Ruifeng Ren
Renmin University of China
Machine learningLLMs
Wei Yao
Wei Yao
Renmin University of China
Trustworthy AIAI Safety
X
Xiaolin Hu
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Xiamen University
G
Gengze Xu
Gaoling School of Artificial Intelligence, Renmin University of China
Chen Qian
Chen Qian
Renmin University of China
Large Language ModelsSafetyInterpretabilityGraph Neural Networks
H
Huayi Tang
Gaoling School of Artificial Intelligence, Renmin University of China
Zixuan Gong
Zixuan Gong
PhD student, Renmin University of China (RUC)
LLM Theory
Xinhao Yao
Xinhao Yao
Renmin University of China
Large Language Models
P
Pengwei Tang
Gaoling School of Artificial Intelligence, Renmin University of China
Z
Zhenxing Dou
Gaoling School of Artificial Intelligence, Renmin University of China
Y
Yong Liu
Gaoling School of Artificial Intelligence, Renmin University of China