🤖 AI Summary
This work addresses three core limitations of large language models (LLMs) in professional settings: weak domain-specific reasoning, insufficient ethical robustness, and poor domain adaptability. To tackle these, we propose the Post-Training Language Model (PoLM) research framework. Methodologically, we systematically integrate five technical paradigms—supervised fine-tuning (SFT), alignment (e.g., RLHF), reasoning augmentation (e.g., chain-of-thought), efficiency optimization (e.g., knowledge distillation), and multimodal adaptation—leveraging high-quality preference and reasoning datasets. Our contributions are threefold: (1) we establish the first unified taxonomy and evolutionary roadmap for PoLMs, covering methodologies, benchmark datasets, and evaluation dimensions; (2) we formally delineate the development trajectory of Large Reasoning Models (LRMs); and (3) we empirically demonstrate substantial improvements in professional reasoning accuracy, ethical reliability, and cross-domain generalization. This work provides both a theoretical foundation and a practical paradigm for next-generation LLMs that are highly trustworthy, strongly reasoning-capable, and broadly adaptable.
📝 Abstract
The emergence of Large Language Models (LLMs) has fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration. However, their pre-trained architectures often reveal limitations in specialized contexts, including restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance. These challenges necessitate advanced post-training language models (PoLMs) to address these shortcomings, such as OpenAI-o1/o3 and DeepSeek-R1 (collectively known as Large Reasoning Models, or LRMs). This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Efficiency, which optimizes resource utilization amidst increasing complexity; and Integration and Adaptation, which extend capabilities across diverse modalities while addressing coherence issues. Charting progress from ChatGPT's foundational alignment strategies to DeepSeek-R1's innovative reasoning advancements, we illustrate how PoLMs leverage datasets to mitigate biases, deepen reasoning capabilities, and enhance domain adaptability. Our contributions include a pioneering synthesis of PoLM evolution, a structured taxonomy categorizing techniques and datasets, and a strategic agenda emphasizing the role of LRMs in improving reasoning proficiency and domain flexibility. As the first survey of its scope, this work consolidates recent PoLM advancements and establishes a rigorous intellectual framework for future research, fostering the development of LLMs that excel in precision, ethical robustness, and versatility across scientific and societal applications.