Seeing Clearly without Training: Mitigating Hallucinations in Multimodal LLMs for Remote Sensing

πŸ“… 2026-03-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the severe hallucination issues in remote sensing visual question answering, where multimodal large language models often fail due to inaccurate visual grounding or misinterpretation of fine-grained small objects. To mitigate this, we propose RADAR, a training-free inference framework that leverages the model’s intrinsic attention mechanisms to enable progressive localization and fine-grained local reasoning during inference through a relative attention-driven active reasoning strategy. We also introduce RSHBench, the first benchmark specifically designed for fine-grained diagnosis of hallucinations in remote sensing. Experimental results demonstrate that RADAR significantly enhances question-answering performance across multiple multimodal large language models and effectively suppresses both factual and logical hallucinations.

Technology Category

Application Category

πŸ“ Abstract
Multimodal large language models (MLLMs) suffer from pronounced hallucinations in remote sensing visual question-answering (RS-VQA), primarily caused by visual grounding failures in large-scale scenes or misinterpretation of fine-grained small targets. To systematically analyze these issues, we introduce RSHBench, a protocol-based benchmark for fine-grained diagnosis of factual and logical hallucinations. To mitigate grounding-induced factual hallucinations, we further propose Relative Attention-Driven Actively Reasoning (RADAR), a training-free inference method that leverages intrinsic attention in MLLMs to guide progressive localization and fine-grained local reasoning at test time. Extensive experiments across diverse MLLMs demonstrate that RADAR consistently improves RS-VQA performance and reduces both factual and logical hallucinations. Code and data will be publicly available at: https://github.com/MiliLab/RADAR
Problem

Research questions and friction points this paper is trying to address.

hallucinations
multimodal LLMs
remote sensing
visual question answering
visual grounding
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free inference
visual grounding
hallucination mitigation
attention-driven reasoning
remote sensing VQA
πŸ”Ž Similar Papers
No similar papers found.