Fine-grained Defocus Blur Control for Generative Image Models

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing text-to-image diffusion models struggle to achieve fine-grained, physically grounded depth-of-field control—such as adjustable aperture and focus distance—without altering scene content. This work proposes a physics-inspired, unsupervised framework: it first generates an all-in-focus image, then jointly estimates monocular depth and predicts focus distance via a differentiable lens blur model to synthesize photorealistic, depth-consistent defocus effects. Its core innovation is the Focus Distance Transformer, enabling interactive, inference-time adjustment of both blur intensity and focal plane position. The entire pipeline is end-to-end trainable without requiring EXIF metadata or annotated focus-distance supervision. Experiments demonstrate that our method significantly outperforms prior approaches across diverse scenes, achieving high-fidelity, content-preserving, and fine-grained controllable defocus synthesis.

Technology Category

Application Category

📝 Abstract

Current text-to-image diffusion models excel at generating diverse, high-quality images, yet they struggle to incorporate fine-grained camera metadata such as precise aperture settings. In this work, we introduce a novel text-to-image diffusion framework that leverages camera metadata, or EXIF data, which is often embedded in image files, with an emphasis on generating controllable lens blur. Our method mimics the physical image formation process by first generating an all-in-focus image, estimating its monocular depth, predicting a plausible focus distance with a novel focus distance transformer, and then forming a defocused image with an existing differentiable lens blur model. Gradients flow backwards through this whole process, allowing us to learn without explicit supervision to generate defocus effects based on content elements and the provided EXIF data. At inference time, this enables precise interactive user control over defocus effects while preserving scene contents, which is not achievable with existing diffusion models. Experimental results demonstrate that our model enables superior fine-grained control without altering the depicted scene.

Problem

Research questions and friction points this paper is trying to address.

Enabling precise aperture control in text-to-image diffusion models

Generating controllable lens blur using camera EXIF metadata

Achieving interactive defocus control without altering scene content

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates all-in-focus image with depth estimation

Predicts focus distance using transformer architecture

Applies differentiable lens blur model for defocus control

🔎 Similar Papers

No similar papers found.