Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation

📅 2024-12-10

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address poor generalization and limited performance in event-camera video frame interpolation (EVFI) caused by scarce annotated data, this paper pioneers the adaptation of internet-scale pre-trained video diffusion models to EVFI. We propose an event-frame cross-modal feature alignment mechanism and an unpaired motion modeling strategy, enabling robust temporal interpolation without pixel-level event-image correspondence labels. Through lightweight fine-tuning tailored to sparse event streams and keyframes, the model achieves significantly improved cross-device generalization. Extensive evaluation on multiple real-world EVFI benchmarks—including a newly constructed dataset—demonstrates consistent superiority over state-of-the-art methods. Notably, cross-camera testing yields a PSNR improvement of up to 2.1 dB, validating the effective transfer of large-scale generative priors to low-resource discriminative tasks.

Technology Category

Application Category

📝 Abstract

Video Frame Interpolation aims to recover realistic missing frames between observed frames, generating a high-frame-rate video from a low-frame-rate video. However, without additional guidance, the large motion between frames makes this problem ill-posed. Event-based Video Frame Interpolation (EVFI) addresses this challenge by using sparse, high-temporal-resolution event measurements as motion guidance. This guidance allows EVFI methods to significantly outperform frame-only methods. However, to date, EVFI methods have relied on a limited set of paired event-frame training data, severely limiting their performance and generalization capabilities. In this work, we overcome the limited data challenge by adapting pre-trained video diffusion models trained on internet-scale datasets to EVFI. We experimentally validate our approach on real-world EVFI datasets, including a new one that we introduce. Our method outperforms existing methods and generalizes across cameras far better than existing approaches.

Problem

Research questions and friction points this paper is trying to address.

Repurpose video diffusion models for event-based interpolation

Overcome limited paired event-frame training data challenge

Improve generalization across cameras for video interpolation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Repurposing pre-trained video diffusion models

Using event measurements as motion guidance

Outperforming existing EVFI methods

🔎 Similar Papers

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation

2024-08-27arXiv.orgCitations: 7

Netflix

$40/hour - $110/hour

Los Gatos, CA, USA / Los Angeles, CA, USA

AI Research Scientist, Computer Vision - Facebook Video Intelligence