FIRM: Flexible Interactive Reflection ReMoval

📅 2024-06-03
🏛️ AAAI Conference on Artificial Intelligence
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Single-image reflection removal remains challenging due to the absence of universal reflection priors and limited flexibility in user interaction. To address this, we propose the first end-to-end differentiable framework supporting sparse, multimodal user guidance—including points, bounding boxes, strokes, and text. Our method introduces a User-Guided Conversion (UGC) module that uniformly encodes heterogeneous inputs into contrastive masks, and a Contrastive-Guided Interactive Block (CGIB) coupled with cross-attention fusion to achieve precise separation of reflection and transmission layers. The interactive inference latency is reduced to only 10% of that of state-of-the-art (SOTA) methods. Evaluated on real-world datasets, our approach achieves SOTA performance, significantly improving both reflection removal quality and practical interactivity.

Technology Category

Application Category

📝 Abstract
Removing reflection from a single image is challenging due to the absence of general reflection priors. Although existing methods incorporate extensive user guidance for satisfactory performance, they often lack the flexibility to adapt user guidance in different modalities, and dense user interactions further limit their practicality. To alleviate these problems, this paper presents FIRM, a novel framework for Flexible Interactive image Reflection reMoval with various forms of guidance, where users can provide sparse visual guidance (e.g., points, boxes, or strokes) or text descriptions for better reflection removal. Firstly, we design a novel user guidance conversion module (UGC) to transform different forms of guidance into unified contrastive masks. The contrastive masks provide explicit cues for identifying reflection and transmission layers in blended images. Secondly, we devise a contrastive mask-guided reflection removal network that comprises a newly proposed contrastive guidance interaction block (CGIB). This block leverages a unique cross-attention mechanism that merges contrastive masks with image features, allowing for precise layer separation. The proposed framework requires only 10% of the guidance time needed by previous interactive methods, which makes a step-change in flexibility. Extensive results on public real-world reflection removal datasets validate that our method demonstrates state-of-the-art reflection removal performance.
Problem

Research questions and friction points this paper is trying to address.

Lack general reflection priors in single image
Existing methods lack flexible user guidance
Need efficient interactive reflection removal framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts diverse user guidance into unified masks
Uses contrastive masks for precise layer separation
Reduces guidance time by 90% with cross-attention
🔎 Similar Papers