🤖 AI Summary
Single-image reflection removal remains highly challenging due to the complex interplay of light reflection and transmission at glass surfaces, and existing datasets are limited in both physical realism and scale. This work proposes a high-fidelity synthetic data generation framework that employs path-traced rendering of 3D glass models composited over real background images, producing single-input multi-layer composites accompanied by joint image captions. For the first time, a large multimodal language model (LMM) is integrated into this task, enhanced with task-specific LoRA fine-tuning, achieving state-of-the-art performance in both reflection removal and separation.
📝 Abstract
Glass surfaces create complex interactions of reflected and transmitted light, making single-image reflection removal (SIRR) challenging. Existing datasets suffer from limited physical realism in synthetic data or insufficient scale in real captures. We introduce a synthetic dataset generation framework that path-traces 3D glass models over real background imagery to create physically accurate reflection scenarios with varied glass properties, camera settings, and post-processing effects. To leverage the capabilities of Large Multimodal Model (LMM), we concatenate the image layers into a single composite input, apply joint captioning, and fine-tune the model using task-specific LoRA rather than full-parameter training. This enables our approach to achieve improved reflection removal and separation performance compared to state-of-the-art methods.