Generalizable Implicit Motion Modeling for Video Frame Interpolation

📅 2024-07-11

🏛️ Neural Information Processing Systems

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Existing video frame interpolation (VFI) methods rely on either linear combinations or direct regression of bidirectional optical flow, limiting their ability to model complex spatiotemporal dynamics and resulting in poor generalization. To address this, we propose the Generalized Implicit Motion Modeling (GIMM) framework—the first to jointly integrate motion latent variables with an adaptive coordinate-based MLP, enabling implicit, input-specific prediction of optical flow at arbitrary timestamps while explicitly encoding motion priors. GIMM comprises three core components: a motion encoding pipeline, a pre-trained flow extractor, and an implicit spatiotemporal function modeling module. Evaluated on standard VFI benchmarks, GIMM significantly outperforms state-of-the-art methods in interpolation accuracy and motion consistency. Moreover, it serves as a plug-and-play enhancement for diverse optical-flow-driven interpolation systems, demonstrating broad applicability and robustness.

Technology Category

Application Category

📝 Abstract

Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. Specifically, to enable GIMM as an effective motion modeling paradigm, we design a motion encoding pipeline to model spatiotemporal motion latent from bidirectional flows extracted from pre-trained flow estimators, effectively representing input-specific motion priors. Then, we implicitly predict arbitrary-timestep optical flows within two adjacent input frames via an adaptive coordinate-based neural network, with spatiotemporal coordinates and motion latent as inputs. Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion. We show that GIMM performs better than the current state of the art on standard VFI benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Improves video frame interpolation accuracy

Models spatiotemporal dynamics effectively

Enhances motion priors utilization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalizable Implicit Motion Modeling

Adaptive coordinate-based neural network

Motion encoding pipeline

🔎 Similar Papers

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

2024-08-27arXiv.orgCitations: 7

CPT-Interp: Continuous sPatial and Temporal Motion Modeling for 4D Medical Image Interpolation

2024-05-24arXiv.orgCitations: 1

Netflix

The overall market range for Netflix Internships is typically $40/hour - $110/hour.

Los Gatos, CA, USA / Los Angeles, CA, USA

AI Research Scientist, Computer Vision - Facebook Video Intelligence