🤖 AI Summary
This work addresses the significant performance degradation of existing foundation models, such as SAM, in instance segmentation under complex lighting conditions. The authors propose a novel Illumination-aware Convolution-Attention (LCA) adapter module that enhances illumination robustness without fine-tuning the backbone network. Leveraging a dual-branch architecture, the LCA module effectively fuses RGB features with physically inspired contrast maps. To better simulate real-world illumination challenges, the study introduces an illumination-aware mechanism and constructs a synthetic dataset using Unity. Model optimization is further guided by a paired training strategy and an illumination-invariant loss function. Experimental results demonstrate that the proposed approach substantially improves both segmentation accuracy and illumination adaptability across multiple standard benchmarks and a newly curated illumination-sensitive dataset.
📝 Abstract
Foundation models like the Segment Anything Model (SAM) demonstrate impressive zero-shot generalization but frequently degrade under diverse real-world illumination, particularly for instance segmentation. In this work, we address this limitation by developing \textit{Lighting Convolutional-Attention (\lca{})}, an adapter module that enhances segmentation robustness without fine-tuning the heavy backbone. \lca{} employs a dual-branch architecture to process RGB features alongside contrast maps, enabling physically motivated sensitivity to structural changes rather than illumination artifacts. We optimize \lca{} through a pairwise training strategy, introducing a targeted loss term that explicitly penalizes discrepancies between clean images and their corresponding illumination variants. To evaluate and support this architecture, we conduct a comprehensive empirical study across multiple existing benchmarks and present a novel Unity-based synthetic dataset specifically designed to accurately replicate complex real-world lighting conditions. Extensive experimental results demonstrate that our approach successfully bridges the domain gap, delivering superior lighting-robust segmentation.