🤖 AI Summary
This work addresses controllable bokeh rendering without requiring depth maps. Method: We propose the first end-to-end, text-guided generative framework that eliminates reliance on depth estimation. Instead, foreground–background motion relationships are modeled via optical flow matching, while cross-attention mechanisms precisely map textual prompts (e.g., “focus on the flower with strong background blur”) to spatial focus regions and per-pixel blur intensity distributions. Contribution/Results: To support training and evaluation, we introduce a dedicated bokeh dataset comprising four scene categories. Extensive experiments demonstrate that our method surpasses existing depth-based approaches and general-purpose generative models in visual fidelity, semantic controllability, and inference efficiency. It significantly lowers deployment barriers and enhances user interactivity, enabling intuitive, text-driven bokeh control.
📝 Abstract
Bokeh rendering simulates the shallow depth-of-field effect in photography, enhancing visual aesthetics and guiding viewer attention to regions of interest. Although recent approaches perform well, rendering controllable bokeh without additional depth inputs remains a significant challenge. Existing classical and neural controllable methods rely on accurate depth maps, while generative approaches often struggle with limited controllability and efficiency. In this paper, we propose BokehFlow, a depth-free framework for controllable bokeh rendering based on flow matching. BokehFlow directly synthesizes photorealistic bokeh effects from all-in-focus images, eliminating the need for depth inputs. It employs a cross-attention mechanism to enable semantic control over both focus regions and blur intensity via text prompts. To support training and evaluation, we collect and synthesize four datasets. Extensive experiments demonstrate that BokehFlow achieves visually compelling bokeh effects and offers precise control, outperforming existing depth-dependent and generative methods in both rendering quality and efficiency.