CroBIM-U: Uncertainty-Driven Referring Remote Sensing Image Segmentation

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges in referring expression segmentation for remote sensing imagery, where large scale variations, dense distractors, and complex boundaries lead to spatially inconsistent reliability in cross-modal alignment. To tackle this issue, the authors propose an uncertainty-guided adaptive inference framework. It first introduces a Referring Uncertainty Scorer (RUS) that learns pixel-wise uncertainty maps through online error-consistency supervision. Building upon this, two plug-and-play modules—Uncertainty-Gated Fusion (UGF) and Uncertainty-Driven Local Refinement (UDLR)—are developed to dynamically modulate language-guided feature fusion and local refinement processes. Notably, the approach requires no modification to the backbone network and consistently enhances segmentation robustness and geometric accuracy across multiple remote sensing benchmarks, demonstrating its effectiveness as a general-purpose enhancement strategy.

Technology Category

Application Category

📝 Abstract
Referring remote sensing image segmentation aims to localize specific targets described by natural language within complex overhead imagery. However, due to extreme scale variations, dense similar distractors, and intricate boundary structures, the reliability of cross-modal alignment exhibits significant \textbf{spatial non-uniformity}. Existing methods typically employ uniform fusion and refinement strategies across the entire image, which often introduces unnecessary linguistic perturbations in visually clear regions while failing to provide sufficient disambiguation in confused areas. To address this, we propose an \textbf{uncertainty-guided framework} that explicitly leverages a pixel-wise \textbf{referring uncertainty map} as a spatial prior to orchestrate adaptive inference. Specifically, we introduce a plug-and-play \textbf{Referring Uncertainty Scorer (RUS)}, which is trained via an online error-consistency supervision strategy to interpretably predict the spatial distribution of referential ambiguity. Building on this prior, we design two plug-and-play modules: 1) \textbf{Uncertainty-Gated Fusion (UGF)}, which dynamically modulates language injection strength to enhance constraints in high-uncertainty regions while suppressing noise in low-uncertainty ones; and 2) \textbf{Uncertainty-Driven Local Refinement (UDLR)}, which utilizes uncertainty-derived soft masks to focus refinement on error-prone boundaries and fine details. Extensive experiments demonstrate that our method functions as a unified, plug-and-play solution that significantly improves robustness and geometric fidelity in complex remote sensing scenes without altering the backbone architecture.
Problem

Research questions and friction points this paper is trying to address.

referring remote sensing image segmentation
spatial non-uniformity
cross-modal alignment
referential ambiguity
uncertainty
Innovation

Methods, ideas, or system contributions that make the work stand out.

uncertainty-guided segmentation
referring uncertainty map
uncertainty-gated fusion
local refinement
remote sensing image segmentation
🔎 Similar Papers
Y
Yuzhe Sun
School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150001, China
Zhe Dong
Zhe Dong
Microsoft AI
H
Haochen Jiang
School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150001, China
T
Tianzhu Liu
School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150001, China
Yanfeng Gu
Yanfeng Gu
Professor of Electronics Engineering, Harbin Institute of Technology
image processingpattern recognitionmachine learning