🤖 AI Summary
Existing lens blur rendering methods suffer from depth estimation errors, leading to artifacts at depth discontinuities and struggling to balance physical accuracy with visual plausibility. To address this, we propose a generative diffusion-prior-based neural lens blur rendering framework. Our method introduces a physics-inspired, depth-aware self-attention mechanism that explicitly models circle-of-confusion scaling and self-occlusion. We adopt a single-step diffusion inference paradigm, enabling efficient, high-fidelity blur synthesis without iterative denoising. Furthermore, we leverage diffusion models to synthesize diverse foreground images with alpha mattes, constructing physically aligned training data. Experiments demonstrate that our approach significantly outperforms state-of-the-art methods on both synthetic and real-world scenes. Notably, it achieves sharper, more natural blur transitions at object boundaries and depth edges—delivering superior fidelity while maintaining real-time inference potential.
📝 Abstract
We introduce BokehDiff, a novel lens blur rendering method that achieves physically accurate and visually appealing outcomes, with the help of generative diffusion prior. Previous methods are bounded by the accuracy of depth estimation, generating artifacts in depth discontinuities. Our method employs a physics-inspired self-attention module that aligns with the image formation process, incorporating depth-dependent circle of confusion constraint and self-occlusion effects. We adapt the diffusion model to the one-step inference scheme without introducing additional noise, and achieve results of high quality and fidelity. To address the lack of scalable paired data, we propose to synthesize photorealistic foregrounds with transparency with diffusion models, balancing authenticity and scene diversity.