PromptRR: Diffusion Models as Prompt Generators for Single Image Reflection Removal

๐Ÿ“… 2024-02-04
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 7
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing single-image reflection removal (SIRR) methods inadequately model low-frequency (global structure) and high-frequency (textural details) components, leading to residual reflections and structural distortions. To address this, we propose PromptRRโ€”the first SIRR framework integrating diffusion models as a frequency-aware prompt generator. It comprises a pretrained frequency prompt encoder and a lightweight Transformer-based prompt block, enabling stage-wise, frequency-prior-driven prompt generation and guided reconstruction. By synergistically combining frequency-domain decomposition with the PromptFormer network, PromptRR jointly models structural coherence and textural fidelity. Extensive experiments demonstrate that PromptRR achieves state-of-the-art performance across standard benchmarks: it effectively suppresses reflection artifacts while significantly improving texture preservation and global structural consistency. The source code is publicly available.

Technology Category

Application Category

๐Ÿ“ Abstract
Existing single image reflection removal (SIRR) methods using deep learning tend to miss key low-frequency (LF) and high-frequency (HF) differences in images, affecting their effectiveness in removing reflections. To address this problem, this paper proposes a novel prompt-guided reflection removal (PromptRR) framework that uses frequency information as new visual prompts for better reflection performance. Specifically, the proposed framework decouples the reflection removal process into the prompt generation and subsequent prompt-guided restoration. For the prompt generation, we first propose a prompt pre-training strategy to train a frequency prompt encoder that encodes the ground-truth image into LF and HF prompts. Then, we adopt diffusion models (DMs) as prompt generators to generate the LF and HF prompts estimated by the pre-trained frequency prompt encoder. For the prompt-guided restoration, we integrate specially generated prompts into the PromptFormer network, employing a novel Transformer-based prompt block to effectively steer the model toward enhanced reflection removal. The results on commonly used benchmarks show that our method outperforms state-of-the-art approaches. The codes and models are available at https://github.com/TaoWangzj/PromptRR.
Problem

Research questions and friction points this paper is trying to address.

Addresses missing low-frequency and high-frequency differences in reflection removal
Proposes prompt-guided framework using frequency information as visual prompts
Decouples process into prompt generation and prompt-guided restoration steps
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion models as prompt generators
Encodes frequency information into visual prompts
Integrates prompts via Transformer-based restoration network
๐Ÿ”Ž Similar Papers
No similar papers found.
T
Tao Wang
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
W
Wanglong Lu
Memorial University of Newfoundland, St. Johnโ€™s, Canada
Kaihao Zhang
Kaihao Zhang
Australian National University
Deep learningComputer vision
Wenhan Luo
Wenhan Luo
Associate Professor, HKUST
Creative AIGenerative ModelComputer VisionMachine Learning
T
Tae-Kyun Kim
Imperial College London, London, UK & KAIST, Daejeon, South Korea
T
Tong Lu
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
H
Hongdong Li
Australian National University, Canberra, Australia
Ming-Hsuan Yang
Ming-Hsuan Yang
University of California at Merced; Google DeepMind
Computer VisionMachine LearningArtificial Intelligence