Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Existing video defocusing methods lack explicit control over focal plane placement and blur intensity, and suffer from temporal flickering and unnatural edge transitions due to insufficient temporal modeling. This paper introduces the first Multi-Plane Image (MPI)-guided one-step video diffusion framework, integrating Stable Video Diffusion’s 3D prior transfer with a progressive depth-sampling strategy to jointly control focal depth, blur intensity, and depth-of-field distribution within a unified generative process. By explicitly encoding scene geometry via MPI and jointly optimizing for temporal consistency, depth robustness, and edge fidelity during training, our method significantly mitigates temporal instability and boundary artifacts. Extensive evaluations on multiple benchmarks demonstrate state-of-the-art performance, producing high-fidelity, temporally stable, focus-controllable, and naturally blurred depth-aware video defocusing results.

Technology Category

Application Category

📝 Abstract

Recent advances in diffusion based editing models have enabled realistic camera simulation and image-based bokeh, but video bokeh remains largely unexplored. Existing video editing models cannot explicitly control focus planes or adjust bokeh intensity, limiting their applicability for controllable optical effects. Moreover, naively extending image-based bokeh methods to video often results in temporal flickering and unsatisfactory edge blur transitions due to the lack of temporal modeling and generalization capability. To address these challenges, we propose a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aware bokeh effects. Our method leverages a multi-plane image (MPI) representation constructed through a progressively widening depth sampling function, providing explicit geometric guidance for depth-dependent blur synthesis. By conditioning a single-step video diffusion model on MPI layers and utilizing the strong 3D priors from pre-trained models such as Stable Video Diffusion, our approach achieves realistic and consistent bokeh effects across diverse scenes. Additionally, we introduce a progressive training strategy to enhance temporal consistency, depth robustness, and detail preservation. Extensive experiments demonstrate that our method produces high-quality, controllable bokeh effects and achieves state-of-the-art performance on multiple evaluation benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Video bokeh lacks focus plane control and intensity adjustment

Image-based bokeh methods cause flickering and poor edge transitions in videos

Existing methods lack temporal consistency and depth-aware blur synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

One-step video bokeh via diffusion model

Multi-plane image guided depth-aware blur

Progressive training for temporal consistency

🔎 Similar Papers

No similar papers found.