HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Low-quality and insufficiently diverse public preference data for STEM, programming, and multilingual tasks hinder the generalization capability of reward models (RMs). To address this, we introduce the first open-source, cross-task and cross-lingual preference dataset—comprising 42,000 high-diversity samples—curated via human crowdsourcing with multi-dimensional quality control and released under the CC-BY-4.0 license. We propose the first unified annotation framework for preference data and integrate it into both generative RM training and evaluation. Our trained RM achieves 82.4% and 73.7% accuracy on RM-Bench and JudgeBench, respectively—improving upon state-of-the-art by approximately 10 percentage points in absolute accuracy. This advancement significantly enhances the robustness and generalization of reward modeling in RLHF pipelines.

Technology Category

Application Category

📝 Abstract

Preference datasets are essential for training general-domain, instruction-following language models with Reinforcement Learning from Human Feedback (RLHF). Each subsequent data release raises expectations for future data collection, meaning there is a constant need to advance the quality and diversity of openly available preference data. To address this need, we introduce HelpSteer3-Preference, a permissively licensed (CC-BY-4.0), high-quality, human-annotated preference dataset comprising of over 40,000 samples. These samples span diverse real-world applications of large language models (LLMs), including tasks relating to STEM, coding and multilingual scenarios. Using HelpSteer3-Preference, we train Reward Models (RMs) that achieve top performance on RM-Bench (82.4%) and JudgeBench (73.7%). This represents a substantial improvement (~10% absolute) over the previously best-reported results from existing RMs. We demonstrate HelpSteer3-Preference can also be applied to train Generative RMs and how policy models can be aligned with RLHF using our RMs. Dataset (CC-BY-4.0): https://huggingface.co/datasets/nvidia/HelpSteer3#preference

Problem

Research questions and friction points this paper is trying to address.

Enhancing quality and diversity of open preference data for RLHF

Addressing need for diverse real-world LLM applications in datasets

Improving performance of Reward Models in benchmark evaluations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces HelpSteer3-Preference dataset with 40,000 samples

Trains Reward Models achieving top benchmark performance

Applies dataset to Generative RMs and RLHF alignment

🔎 Similar Papers

No similar papers found.