Beyond the Black Box: Theory and Mechanism of Large Language Models

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 1

career value

230K/year

🤖 AI Summary

While large language models have demonstrated remarkable engineering success, they remain theoretically underdeveloped and mechanistically opaque—essentially operating as “black boxes.” This work proposes the first unified theoretical framework encompassing the entire lifecycle of large language models, systematically analyzing the core mechanisms across six stages: data preparation, model construction, training, alignment, inference, and evaluation. By integrating information theory, optimization theory, and representation learning, the framework elucidates the mathematical principles underlying critical issues such as data mixing strategies, architectural expressivity, and alignment optimization. Furthermore, it identifies forward-looking challenges including self-improving synthetic data generation, safety boundaries, and the origins of emergent intelligence. This study provides a structured roadmap toward transforming large language models from empirical engineering artifacts into an explainable, predictable, and verifiable scientific discipline.

Technology Category

Application Category

📝 Abstract

The rapid emergence of Large Language Models (LLMs) has precipitated a profound paradigm shift in Artificial Intelligence, delivering monumental engineering successes that increasingly impact modern society. However, a critical paradox persists within the current field: despite the empirical efficacy, our theoretical understanding of LLMs remains disproportionately nascent, forcing these systems to be treated largely as ``black boxes''. To address this theoretical fragmentation, this survey proposes a unified lifecycle-based taxonomy that organizes the research landscape into six distinct stages: Data Preparation, Model Preparation, Training, Alignment, Inference, and Evaluation. Within this framework, we provide a systematic review of the foundational theories and internal mechanisms driving LLM performance. Specifically, we analyze core theoretical issues such as the mathematical justification for data mixtures, the representational limits of various architectures, and the optimization dynamics of alignment algorithms. Moving beyond current best practices, we identify critical frontier challenges, including the theoretical limits of synthetic data self-improvement, the mathematical bounds of safety guarantees, and the mechanistic origins of emergent intelligence. By connecting empirical observations with rigorous scientific inquiry, this work provides a structured roadmap for transitioning LLM development from engineering heuristics toward a principled scientific discipline.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

theoretical understanding

black box

emergent intelligence

mechanistic interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

large language models

theoretical foundations

lifecycle taxonomy