Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current Chinese reward modeling (RM) research suffers from overreliance on synthetic data and the absence of human-annotated evaluation benchmarks, hindering accurate modeling of human Chinese preferences. To address this, we introduce CheemsBench—the first fully human-annotated Chinese RM evaluation benchmark—and CheemsPreference—a large-scale human-AI collaborative preference dataset—alongside a rigorous human quality control protocol and a dual-path RM training paradigm integrating discriminative and generative objectives. Our key contributions are: (1) the first systematic demonstration of fundamental limitations of AI-synthesized data in Chinese preference modeling; (2) empirical validation that high-quality human supervision is indispensable for RM performance; (3) a trained RM achieving state-of-the-art results on CheemsBench, significantly outperforming leading open-source RMs; and (4) full open-sourcing of the datasets, benchmark, and evaluation framework to advance Chinese alignment research.

Technology Category

Application Category

📝 Abstract
Reward models (RMs) are crucial for aligning large language models (LLMs) with human preferences. However, most RM research is centered on English and relies heavily on synthetic resources, which leads to limited and less reliable datasets and benchmarks for Chinese. To address this gap, we introduce CheemsBench, a fully human-annotated RM evaluation benchmark within Chinese contexts, and CheemsPreference, a large-scale and diverse preference dataset annotated through human-machine collaboration to support Chinese RM training. We systematically evaluate open-source discriminative and generative RMs on CheemsBench and observe significant limitations in their ability to capture human preferences in Chinese scenarios. Additionally, based on CheemsPreference, we construct an RM that achieves state-of-the-art performance on CheemsBench, demonstrating the necessity of human supervision in RM training. Our findings reveal that scaled AI-generated data struggles to fully capture human preferences, emphasizing the importance of high-quality human supervision in RM development.
Problem

Research questions and friction points this paper is trying to address.

Develops Chinese-specific reward models for language alignment.
Introduces human-annotated benchmarks for Chinese contexts.
Highlights limitations of AI-generated data in preference modeling.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-annotated benchmark CheemsBench
Large-scale dataset CheemsPreference
Human-machine collaboration for training
🔎 Similar Papers
No similar papers found.
Xueru Wen
Xueru Wen
School of Computer Science and Technology, University of Chinese Academy of Sciences
Natural Language ProcessingAlignmentLarge Language Model
Jie Lou
Jie Lou
Xiaohongshu
AlignmentRLHF
Z
Zichao Li
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China
Yaojie Lu
Yaojie Lu
Institute of Software, Chinese Academy of Sciences
Information ExtractionLarge Language Models
X
Xing Yu
University of Chinese Academy of Sciences, Beijing, China
Y
Yuqiu Ji
University of Chinese Academy of Sciences, Beijing, China
Guohai Xu
Guohai Xu
Xiaohongshu Inc., Alibaba DAMO Academy
MLLMAlignment
H
Hongyu Lin
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China
Ben He
Ben He
Professor, University of Chinese Academy of Sciences
Natural Language ProcessingInformation Retrieval
X
Xianpei Han
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China
Le Sun
Le Sun
Institute of Software, CAS
information_retrievalnatural_language_processing
Debing Zhang
Debing Zhang
Xiaohongshu
Machine LearningComputer VisionDeep Learning