RealDPO: Real or Not Real, that is the Preference

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Video generation models still suffer from limited naturalness and contextual consistency in modeling complex motion. To address this, we propose RealDPO—a novel alignment paradigm that introduces real-world videos into preference learning: authentic motion videos serve as positive samples, while generated videos act as negative samples, enabling a motion-fidelity–oriented DPO loss function optimized iteratively without human annotations. To support this framework, we curate RealAction-5K, a high-quality dataset of human daily activities, and perform post-training optimization on a Transformer-based video generation model. Experiments demonstrate significant improvements over state-of-the-art methods: FVD decreases by 18.7%, CLIPSIM increases by 12.3%, and user preference rate rises by 35%. Our approach effectively enhances motion realism, text-video alignment, and overall generation quality.

Technology Category

Application Category

📝 Abstract

Video generative models have recently achieved notable advancements in synthesis quality. However, generating complex motions remains a critical challenge, as existing models often struggle to produce natural, smooth, and contextually consistent movements. This gap between generated and real-world motions limits their practical applicability. To address this issue, we introduce RealDPO, a novel alignment paradigm that leverages real-world data as positive samples for preference learning, enabling more accurate motion synthesis. Unlike traditional supervised fine-tuning (SFT), which offers limited corrective feedback, RealDPO employs Direct Preference Optimization (DPO) with a tailored loss function to enhance motion realism. By contrasting real-world videos with erroneous model outputs, RealDPO enables iterative self-correction, progressively refining motion quality. To support post-training in complex motion synthesis, we propose RealAction-5K, a curated dataset of high-quality videos capturing human daily activities with rich and precise motion details. Extensive experiments demonstrate that RealDPO significantly improves video quality, text alignment, and motion realism compared to state-of-the-art models and existing preference optimization techniques.

Problem

Research questions and friction points this paper is trying to address.

Addressing unnatural motion generation in video synthesis

Improving motion realism through real-world preference learning

Enhancing contextual motion consistency in generative models

Innovation

Methods, ideas, or system contributions that make the work stand out.

RealDPO uses real-world data for preference learning

It employs Direct Preference Optimization with tailored loss

RealDPO enables iterative self-correction through video contrast

🔎 Similar Papers

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization