UniE2F: A Unified Diffusion Framework for Event-to-Frame Reconstruction with Video Foundation Models

📅 2026-02-22

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This work addresses the inherent limitation of event cameras—capturing only intensity changes, which results in reconstructed videos lacking static textures and spatial details. To overcome this, the authors propose the first unified framework that integrates a pre-trained video diffusion model with event data. The approach leverages event-conditioned guidance to generate high-fidelity video frames and introduces a novel event-driven inter-frame residual mechanism to enhance reconstruction accuracy. Furthermore, by modulating the reverse diffusion process, the method enables zero-shot frame interpolation and prediction. Extensive experiments demonstrate that the proposed approach significantly outperforms existing methods on both real-world and synthetic datasets, achieving notable improvements in both quantitative metrics and visual quality.

Technology Category

Application Category

📝 Abstract

Event cameras excel at high-speed, low-power, and high-dynamic-range scene perception. However, as they fundamentally record only relative intensity changes rather than absolute intensity, the resulting data streams suffer from a significant loss of spatial information and static texture details. In this paper, we address this limitation by leveraging the generative prior of a pre-trained video diffusion model to reconstruct high-fidelity video frames from sparse event data. Specifically, we first establish a baseline model by directly applying event data as a condition to synthesize videos. Then, based on the physical correlation between the event stream and video frames, we further introduce the event-based inter-frame residual guidance to enhance the accuracy of video frame reconstruction. Furthermore, we extend our method to video frame interpolation and prediction in a zero-shot manner by modulating the reverse diffusion sampling process, thereby creating a unified event-to-frame reconstruction framework. Experimental results on real-world and synthetic datasets demonstrate that our method significantly outperforms previous approaches both quantitatively and qualitatively. We also refer the reviewers to the video demo contained in the supplementary material for video results. The code will be publicly available at https://github.com/CS-GangXu/UniE2F.

Problem

Research questions and friction points this paper is trying to address.

event camera

frame reconstruction

spatial information loss

static texture

video generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion model

event camera

video reconstruction