Learning to Refocus with Video Diffusion Models

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

Autofocus in photography frequently fails, and existing methods struggle to achieve photorealistic, controllable post-capture refocusing from a single defocused image. To address this, we propose the first end-to-end video diffusion-based framework for focal stack synthesis, enabling interactive, real-time refocusing and editing. Our key contributions are: (1) the first application of video diffusion models to perceptually realistic focal stack generation; (2) the construction of the first large-scale, real-world smartphone-captured focal stack dataset; and (3) a novel training paradigm integrating defocus modeling, real-domain image synthesis, and multi-scale spatiotemporal denoising. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches in perceptual quality, robustness to diverse defocus patterns, and generalization to complex scenes. Both code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Focus is a cornerstone of photography, yet autofocus systems often fail to capture the intended subject, and users frequently wish to adjust focus after capture. We introduce a novel method for realistic post-capture refocusing using video diffusion models. From a single defocused image, our approach generates a perceptually accurate focal stack, represented as a video sequence, enabling interactive refocusing and unlocking a range of downstream applications. We release a large-scale focal stack dataset acquired under diverse real-world smartphone conditions to support this work and future research. Our method consistently outperforms existing approaches in both perceptual quality and robustness across challenging scenarios, paving the way for more advanced focus-editing capabilities in everyday photography. Code and data are available at www.learn2refocus.github.io

Problem

Research questions and friction points this paper is trying to address.

Generates focal stacks from single defocused images

Enables interactive post-capture refocusing in photography

Outperforms existing methods in perceptual quality and robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses video diffusion models for refocusing

Generates focal stack from single defocused image

Enables interactive post-capture focus adjustment

🔎 Similar Papers

No similar papers found.