Video Motion Graphs

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses key challenges in generating realistic human motion videos conditioned on multimodal inputs (e.g., music, action labels, or reference videos), particularly the difficulty of disentangling motion dynamics from appearance textures and producing unnatural transitions between video segments. To this end, we propose HMInterp—a novel dual-branch interpolation framework. The motion branch employs a diffusion model over skeletal trajectories to ensure kinematically accurate motion generation, while the appearance branch adopts a pose-guided diffusion-based video frame interpolation model to synthesize high-fidelity textures. We further introduce a conditional progressive training strategy that jointly leverages strong and weak identity constraints to enhance generalization. Technically, HMInterp integrates video frame interpolation, motion diffusion modeling, multimodal retrieval, and graph-structured synthesis. Extensive experiments demonstrate that HMInterp significantly outperforms state-of-the-art generative and retrieval-based methods in both motion accuracy and visual texture quality.

Technology Category

Application Category

📝 Abstract

We present Video Motion Graphs, a system designed to generate realistic human motion videos. Using a reference video and conditional signals such as music or motion tags, the system synthesizes new videos by first retrieving video clips with gestures matching the conditions and then generating interpolation frames to seamlessly connect clip boundaries. The core of our approach is HMInterp, a robust Video Frame Interpolation (VFI) model that enables seamless interpolation of discontinuous frames, even for complex motion scenarios like dancing. HMInterp i) employs a dual-branch interpolation approach, combining a Motion Diffusion Model for human skeleton motion interpolation with a diffusion-based video frame interpolation model for final frame generation. ii) adopts condition progressive training to effectively leverage identity strong and weak conditions, such as images and pose. These designs ensure both high video texture quality and accurate motion trajectory. Results show that our Video Motion Graphs outperforms existing generative- and retrieval-based methods for multi-modal conditioned human motion video generation. Project page can be found at https://h-liu1997.github.io/Video-Motion-Graphs/

Problem

Research questions and friction points this paper is trying to address.

Generates realistic human motion videos from reference and conditions

Synthesizes videos by retrieving clips and interpolating frames seamlessly

Uses HMInterp for robust interpolation in complex motion scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Video Motion Graphs for human motion synthesis

Employs HMInterp for seamless frame interpolation

Combines Motion Diffusion Model with VFI

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Real-time Video Researcher

Pika

$185K – $400K • 0.1% – 0.3%

Palo Alto, CA, USA

AI Research Scientist, Computer Vision - Facebook Video Intelligence