WorldPM: Scaling Human Preference Modeling

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Human preference signals are fragmented across domains, lacking a unified representation and scalable characterization. Method: We propose World Preference Modeling (WorldPM), a framework for cross-domain preference representation learning, validated on 15M multi-source forum samples across 1.5B–72B parameter models. It introduces large-scale preference data construction, multi-scale training, three decoupled evaluation axes—adversarial, objective, and subjective—and integrates preference representation distillation with RLHF. Contribution/Results: We discover, for the first time, preference modeling’s language-model-like scaling law: adversarial and objective metrics scale consistently with model and data size, whereas subjective metrics do not. WorldPM achieves >5% average improvement across 20 subtasks on seven benchmarks, boosts internal RLHF pipelines by 4–8%, and significantly enhances generalization across sample sizes from 7K to 800K—establishing world preference as a transferable foundational model for preference learning.

Technology Category

Application Category

📝 Abstract

Motivated by scaling laws in language modeling that demonstrate how test loss scales as a power law with model and dataset sizes, we find that similar laws exist in preference modeling. We propose World Preference Modeling$ (WorldPM) to emphasize this scaling potential, where World Preference embodies a unified representation of human preferences. In this paper, we collect preference data from public forums covering diverse user communities, and conduct extensive training using 15M-scale data across models ranging from 1.5B to 72B parameters. We observe distinct patterns across different evaluation metrics: (1) Adversarial metrics (ability to identify deceptive features) consistently scale up with increased training data and base model size; (2) Objective metrics (objective knowledge with well-defined answers) show emergent behavior in larger language models, highlighting WorldPM's scalability potential; (3) Subjective metrics (subjective preferences from a limited number of humans or AI) do not demonstrate scaling trends. Further experiments validate the effectiveness of WorldPM as a foundation for preference fine-tuning. Through evaluations on 7 benchmarks with 20 subtasks, we find that WorldPM broadly improves the generalization performance across human preference datasets of varying sizes (7K, 100K and 800K samples), with performance gains exceeding 5% on many key subtasks. Integrating WorldPM into our internal RLHF pipeline, we observe significant improvements on both in-house and public evaluation sets, with notable gains of 4% to 8% in our in-house evaluations.

Problem

Research questions and friction points this paper is trying to address.

Scaling human preference modeling like language models

Evaluating adversarial, objective, subjective preference metrics

Improving generalization in human preference datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaling preference modeling with power laws

Training on 15M data across 1.5B-72B models

Improving RLHF pipeline performance by 4-8%

🔎 Similar Papers

Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment