🤖 AI Summary
This work addresses the lack of systematic surveys on reward modeling (RM) for aligning large language models (LLMs). We propose a unified taxonomy of RM and a three-dimensional analytical framework—“collect–model–use”—to holistically examine preference data construction, RM architectures, and downstream alignment applications. Synthesizing over 120 peer-reviewed studies, we construct a structured knowledge graph capturing methodological trends and empirical insights. Our analysis identifies six open challenges: scalability, generalization, alignment bias, reward hacking, data efficiency, and cross-task transferability—and distills four emerging research directions. Furthermore, we release an open-source, comprehensive resource repository encompassing benchmark datasets, reference implementations, and standardized evaluation protocols. This survey fills a critical gap in the RM literature, serving as both an accessible entry point for newcomers and a foundational reference for coordinated advancement in LLM alignment research.
📝 Abstract
Reward Model (RM) has demonstrated impressive potential for enhancing Large Language Models (LLM), as RM can serve as a proxy for human preferences, providing signals to guide LLMs' behavior in various tasks. In this paper, we provide a comprehensive overview of relevant research, exploring RMs from the perspectives of preference collection, reward modeling, and usage. Next, we introduce the applications of RMs and discuss the benchmarks for evaluation. Furthermore, we conduct an in-depth analysis of the challenges existing in the field and dive into the potential research directions. This paper is dedicated to providing beginners with a comprehensive introduction to RMs and facilitating future studies. The resources are publicly available at githubfootnote{https://github.com/JLZhong23/awesome-reward-models}.