MotionBridge: Dynamic Video Inbetweening with Flexible Controls

📅 2024-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video interpolation methods struggle to model large motions and lack fine-grained controllability, resulting in unnatural transitions and distorted details. To address this, we propose a multimodal controllable video interpolation framework featuring a dual-generator architecture with dual-branch feature embedding, which decouples and jointly fuses heterogeneous conditioning signals—including trajectory strokes, keyframes, masks, guiding pixels, and text. We further introduce a curriculum-based training strategy to progressively align these multi-source control signals. Our method employs joint diffusion and autoregressive modeling to significantly improve motion coherence, control fidelity, and contextual consistency. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods both qualitatively and quantitatively, enabling dynamic, high-precision, and customizable intermediate-frame generation.

Technology Category

Application Category

📝 Abstract
By generating plausible and smooth transitions between two image frames, video inbetweening is an essential tool for video editing and long video synthesis. Traditional works lack the capability to generate complex large motions. While recent video generation techniques are powerful in creating high-quality results, they often lack fine control over the details of intermediate frames, which can lead to results that do not align with the creative mind. We introduce MotionBridge, a unified video inbetweening framework that allows flexible controls, including trajectory strokes, keyframes, masks, guide pixels, and text. However, learning such multi-modal controls in a unified framework is a challenging task. We thus design two generators to extract the control signal faithfully and encode feature through dual-branch embedders to resolve ambiguities. We further introduce a curriculum training strategy to smoothly learn various controls. Extensive qualitative and quantitative experiments have demonstrated that such multi-modal controls enable a more dynamic, customizable, and contextually accurate visual narrative.
Problem

Research questions and friction points this paper is trying to address.

Video Interpolation
Complex Motion Transition
Detail Control
Innovation

Methods, ideas, or system contributions that make the work stand out.

MotionBridge
Dual-Generator
Progressive Learning
🔎 Similar Papers
No similar papers found.
Maham Tanveer
Maham Tanveer
Simon Fraser University
Computer Vision and Graphics
Y
Yang Zhou
Adobe Research
Simon Niklaus
Simon Niklaus
Staff Research Scientist at Google DeepMind
A
Ali Mahdavi Amiri
Simon Fraser University
H
Hao Zhang
Simon Fraser University
Krishna Kumar Singh
Krishna Kumar Singh
Adobe Research
Computer VisionMachine Learning
N
Nanxuan Zhao
Adobe Research