Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing reward models suffer from two key limitations: modality imbalance—restricted to text or image modalities—and preference rigidity—relying on fixed binary preference pairs. To address these, we propose Omni-RewardModel, the first unified reward modeling framework spanning text, images, video, audio, and 3D data. We introduce a free-form preference annotation mechanism enabling fine-grained, user-specific preference expression. We further construct Omni-RewardBench—a comprehensive evaluation suite comprising nine cross-modal tasks—and Omni-RewardData, a large-scale, multimodal preference dataset. Methodologically, Omni-RewardModel adopts a dual-path architecture integrating discriminative and generative paradigms, jointly optimizing multimodal representation learning, instruction tuning, and free-form preference modeling. Extensive experiments demonstrate that Omni-RewardModel significantly outperforms state-of-the-art methods on both our benchmark and established downstream tasks, achieving consistent cross-modal reward prediction and deeper alignment with nuanced human preferences.

Technology Category

Application Category

📝 Abstract
Reward models (RMs) play a critical role in aligning AI behaviors with human preferences, yet they face two fundamental challenges: (1) Modality Imbalance, where most RMs are mainly focused on text and image modalities, offering limited support for video, audio, and other modalities; and (2) Preference Rigidity, where training on fixed binary preference pairs fails to capture the complexity and diversity of personalized preferences. To address the above challenges, we propose Omni-Reward, a step toward generalist omni-modal reward modeling with support for free-form preferences, consisting of: (1) Evaluation: We introduce Omni-RewardBench, the first omni-modal RM benchmark with free-form preferences, covering nine tasks across five modalities including text, image, video, audio, and 3D; (2) Data: We construct Omni-RewardData, a multimodal preference dataset comprising 248K general preference pairs and 69K instruction-tuning pairs for training generalist omni-modal RMs; (3) Model: We propose Omni-RewardModel, which includes both discriminative and generative RMs, and achieves strong performance on Omni-RewardBench as well as other widely used reward modeling benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Addressing modality imbalance in reward models beyond text and image
Overcoming preference rigidity in capturing diverse human preferences
Developing generalist omni-modal reward modeling with free-form preferences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes Omni-RewardModel for multi-modal reward modeling
Introduces Omni-RewardBench benchmark covering five different modalities
Constructs Omni-RewardData with 248K multimodal preference pairs
🔎 Similar Papers
No similar papers found.
Zhuoran Jin
Zhuoran Jin
Institute of Automation, Chinese Academy of Sciences
Large Language ModelsNatural Language ProcessingKnowledge Engineering
Hongbang Yuan
Hongbang Yuan
Institute of Automation, Chinese Academy of Sciences
Large Language ModelsNatural Language Processing
K
Kejian Zhu
School of Artificial Intelligence, University of Chinese Academy of Sciences
J
Jiachun Li
School of Artificial Intelligence, University of Chinese Academy of Sciences
P
Pengfei Cao
School of Artificial Intelligence, University of Chinese Academy of Sciences
Yubo Chen
Yubo Chen
Institute of Automation, Chinese Academy of Sciences
Natural Language ProcessingInformation ExtractionEvent ExtractionLarge Language Model
K
Kang Liu
School of Artificial Intelligence, University of Chinese Academy of Sciences
J
Jun Zhao
School of Artificial Intelligence, University of Chinese Academy of Sciences