A Survey on Post-training of Large Language Models

📅 2025-03-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three core limitations of large language models (LLMs) in professional settings: weak domain-specific reasoning, insufficient ethical robustness, and poor domain adaptability. To tackle these, we propose the Post-Training Language Model (PoLM) research framework. Methodologically, we systematically integrate five technical paradigms—supervised fine-tuning (SFT), alignment (e.g., RLHF), reasoning augmentation (e.g., chain-of-thought), efficiency optimization (e.g., knowledge distillation), and multimodal adaptation—leveraging high-quality preference and reasoning datasets. Our contributions are threefold: (1) we establish the first unified taxonomy and evolutionary roadmap for PoLMs, covering methodologies, benchmark datasets, and evaluation dimensions; (2) we formally delineate the development trajectory of Large Reasoning Models (LRMs); and (3) we empirically demonstrate substantial improvements in professional reasoning accuracy, ethical reliability, and cross-domain generalization. This work provides both a theoretical foundation and a practical paradigm for next-generation LLMs that are highly trustworthy, strongly reasoning-capable, and broadly adaptable.

Technology Category

Application Category

📝 Abstract
The emergence of Large Language Models (LLMs) has fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration. However, their pre-trained architectures often reveal limitations in specialized contexts, including restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance. These challenges necessitate advanced post-training language models (PoLMs) to address these shortcomings, such as OpenAI-o1/o3 and DeepSeek-R1 (collectively known as Large Reasoning Models, or LRMs). This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Efficiency, which optimizes resource utilization amidst increasing complexity; and Integration and Adaptation, which extend capabilities across diverse modalities while addressing coherence issues. Charting progress from ChatGPT's foundational alignment strategies to DeepSeek-R1's innovative reasoning advancements, we illustrate how PoLMs leverage datasets to mitigate biases, deepen reasoning capabilities, and enhance domain adaptability. Our contributions include a pioneering synthesis of PoLM evolution, a structured taxonomy categorizing techniques and datasets, and a strategic agenda emphasizing the role of LRMs in improving reasoning proficiency and domain flexibility. As the first survey of its scope, this work consolidates recent PoLM advancements and establishes a rigorous intellectual framework for future research, fostering the development of LLMs that excel in precision, ethical robustness, and versatility across scientific and societal applications.
Problem

Research questions and friction points this paper is trying to address.

Address limitations of pre-trained LLMs in specialized contexts.
Enhance reasoning, ethical alignment, and domain-specific performance.
Systematically survey post-training techniques for LLM improvement.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning enhances task-specific accuracy.
Alignment ensures models match human preferences.
Reasoning improves multi-step inference capabilities.
🔎 Similar Papers
No similar papers found.
G
Guiyao Tie
Huazhong University of Science and Technology
Z
Zeli Zhao
Huazhong University of Science and Technology
Dingjie Song
Dingjie Song
Lehigh University; CUHK-Shenzhen; Nanjing University
Multimodal LearningLarge Language Models
F
Fuyang Wei
The University of Hong Kong
R
Rong Zhou
Lehigh University
Yurou Dai
Yurou Dai
Lehigh University
Data MiningMedical Data AnalysisDeep learningAutonomous Driving
W
Wen Yin
Huazhong University of Science and Technology
Z
Zhejian Yang
Jilin University
J
Jiangyue Yan
Southern University of Science and Technology
Yao Su
Yao Su
Worcester Polytechnic Institute
AIMachine LearningData Mining
Z
Zhenhan Dai
Huazhong University of Science and Technology
Y
Yifeng Xie
Huazhong University of Science and Technology
Yihan Cao
Yihan Cao
LinkedIn
L
Lichao Sun
Lehigh University
P
Pan Zhou
Huazhong University of Science and Technology
Lifang He
Lifang He
Associate Professor of Computer Science, Lehigh University
Machine LearningAI for HealthMedical ImagingBiomedical InformaticsTensor Analysis
Hechang Chen
Hechang Chen
School of Artificial Intelligence, Jilin University, China
Machine LearningData MiningDeep Reinforcement LearningComplex Network AnalysisKnowledge Graph
Y
Yu Zhang
Southern University of Science and Technology
Q
Qingsong Wen
Squirrel Ai Learning
Tianming Liu
Tianming Liu
Distinguished Research Professor of Computer Science, University of Georgia
BrainBrain-Inspired AILLMArtificial General IntelligenceQuantum AI
Neil Zhenqiang Gong
Neil Zhenqiang Gong
Associate Professor, Duke University
SecurityAI Security/SafetySocial Networks SecurityGenerative AI
Jiliang Tang
Jiliang Tang
University Foundation Professor of Computer Science and Engineering, Michigan State University
Trustworthy AIGraph Neural NetworksFeature SelectionRecommendations
Caiming Xiong
Caiming Xiong
Salesforce Research
Machine LearningNLPComputer VisionMultimediaData Mining
Heng Ji
Heng Ji
Professor of Computer Science, AICE Director, ASKS Director, UIUC, Amazon Scholar
Natural Language ProcessingLarge Language Models
Philip S. Yu
Philip S. Yu
Professor of Computer Science, University of Illinons at Chicago
Data miningDatabasePrivacy
J
Jianfeng Gao
Microsoft Research