Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation

📅 2024-12-10
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address poor generalization and limited performance in event-camera video frame interpolation (EVFI) caused by scarce annotated data, this paper pioneers the adaptation of internet-scale pre-trained video diffusion models to EVFI. We propose an event-frame cross-modal feature alignment mechanism and an unpaired motion modeling strategy, enabling robust temporal interpolation without pixel-level event-image correspondence labels. Through lightweight fine-tuning tailored to sparse event streams and keyframes, the model achieves significantly improved cross-device generalization. Extensive evaluation on multiple real-world EVFI benchmarks—including a newly constructed dataset—demonstrates consistent superiority over state-of-the-art methods. Notably, cross-camera testing yields a PSNR improvement of up to 2.1 dB, validating the effective transfer of large-scale generative priors to low-resource discriminative tasks.

Technology Category

Application Category

📝 Abstract
Video Frame Interpolation aims to recover realistic missing frames between observed frames, generating a high-frame-rate video from a low-frame-rate video. However, without additional guidance, the large motion between frames makes this problem ill-posed. Event-based Video Frame Interpolation (EVFI) addresses this challenge by using sparse, high-temporal-resolution event measurements as motion guidance. This guidance allows EVFI methods to significantly outperform frame-only methods. However, to date, EVFI methods have relied on a limited set of paired event-frame training data, severely limiting their performance and generalization capabilities. In this work, we overcome the limited data challenge by adapting pre-trained video diffusion models trained on internet-scale datasets to EVFI. We experimentally validate our approach on real-world EVFI datasets, including a new one that we introduce. Our method outperforms existing methods and generalizes across cameras far better than existing approaches.
Problem

Research questions and friction points this paper is trying to address.

Repurpose video diffusion models for event-based interpolation
Overcome limited paired event-frame training data challenge
Improve generalization across cameras for video interpolation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Repurposing pre-trained video diffusion models
Using event measurements as motion guidance
Outperforming existing EVFI methods
🔎 Similar Papers
No similar papers found.