ReAlign: Text-to-Motion Generation via Step-Aware Reward-Guided Alignment

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models for text-to-3D human motion generation suffer from misalignment between textual semantics and motion distributions. Method: This paper proposes a Dynamic Alignment Optimization framework featuring a step-aware reward model and reward-guided sampling strategy to jointly optimize semantic consistency and motion realism during denoising; it further introduces a dual-path alignment mechanism integrating step-aware tokenization, text-alignment, and motion-alignment modules to dynamically balance probabilistic density modeling with semantic constraints. Contribution/Results: Evaluated on multiple benchmarks, our method significantly outperforms state-of-the-art approaches in both motion generation quality and cross-modal retrieval performance, achieving substantial improvements in text-motion alignment accuracy and visual fidelity.

Technology Category

Application Category

📝 Abstract
Text-to-motion generation, which synthesizes 3D human motions from text inputs, holds immense potential for applications in gaming, film, and robotics. Recently, diffusion-based methods have been shown to generate more diversity and realistic motion. However, there exists a misalignment between text and motion distributions in diffusion models, which leads to semantically inconsistent or low-quality motions. To address this limitation, we propose Reward-guided sampling Alignment (ReAlign), comprising a step-aware reward model to assess alignment quality during the denoising sampling and a reward-guided strategy that directs the diffusion process toward an optimally aligned distribution. This reward model integrates step-aware tokens and combines a text-aligned module for semantic consistency and a motion-aligned module for realism, refining noisy motions at each timestep to balance probability density and alignment. Extensive experiments of both motion generation and retrieval tasks demonstrate that our approach significantly improves text-motion alignment and motion quality compared to existing state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Addresses text-motion misalignment in diffusion models
Improves semantic consistency of generated 3D motions
Enhances motion quality through step-aware reward guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward-guided sampling alignment for diffusion models
Step-aware reward model assessing text-motion alignment
Combined semantic and motion modules refine noisy motions
🔎 Similar Papers
No similar papers found.
W
Wanjiang Weng
School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
Xiaofeng Tan
Xiaofeng Tan
Research Intern at Tencent; Master at Southeast University; Dual BSc at Shenzhen Unversity.
AIGCRLHF
J
Junbo Wang
School of Software, Northwestern Polytechnical University, Xi’an, China
Guo-Sen Xie
Guo-Sen Xie
Professor, Nanjing University of Science and Technology
Computer VisionMachine Learning
P
Pan Zhou
Singapore Management University, Singapore
H
Hongsong Wang
School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China