Learning to Refocus with Video Diffusion Models

📅 2025-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autofocus in photography frequently fails, and existing methods struggle to achieve photorealistic, controllable post-capture refocusing from a single defocused image. To address this, we propose the first end-to-end video diffusion-based framework for focal stack synthesis, enabling interactive, real-time refocusing and editing. Our key contributions are: (1) the first application of video diffusion models to perceptually realistic focal stack generation; (2) the construction of the first large-scale, real-world smartphone-captured focal stack dataset; and (3) a novel training paradigm integrating defocus modeling, real-domain image synthesis, and multi-scale spatiotemporal denoising. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches in perceptual quality, robustness to diverse defocus patterns, and generalization to complex scenes. Both code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract
Focus is a cornerstone of photography, yet autofocus systems often fail to capture the intended subject, and users frequently wish to adjust focus after capture. We introduce a novel method for realistic post-capture refocusing using video diffusion models. From a single defocused image, our approach generates a perceptually accurate focal stack, represented as a video sequence, enabling interactive refocusing and unlocking a range of downstream applications. We release a large-scale focal stack dataset acquired under diverse real-world smartphone conditions to support this work and future research. Our method consistently outperforms existing approaches in both perceptual quality and robustness across challenging scenarios, paving the way for more advanced focus-editing capabilities in everyday photography. Code and data are available at www.learn2refocus.github.io
Problem

Research questions and friction points this paper is trying to address.

Generates focal stacks from single defocused images
Enables interactive post-capture refocusing in photography
Outperforms existing methods in perceptual quality and robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses video diffusion models for refocusing
Generates focal stack from single defocused image
Enables interactive post-capture focus adjustment
🔎 Similar Papers
No similar papers found.
S
SaiKiran Tedla
Adobe, USA & York University, Canada
Zhoutong Zhang
Zhoutong Zhang
Computer Scientists, Adobe
computer vision
X
Xuaner Zhang
Adobe, USA
Shumian Xin
Shumian Xin
Adobe
Computer VisionComputational Photography