Efficient Adaptation For Remote Sensing Visual Grounding

πŸ“… 2025-03-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Direct transfer of foundational vision-language models (e.g., Grounding DINO, OFA) to remote sensing visual grounding tasks suffers from significant performance degradation due to domain shift. Method: We propose a lightweight cross-domain adaptation framework. For the first time, we systematically investigate LoRA’s effectiveness across all modules of Grounding DINO; for OFA, we synergistically integrate BitFit and Adapter for parameter-efficient fine-tuning. Contribution/Results: Our method fine-tunes fewer than 10% of model parameters, reducing training cost by over 90% and substantially accelerating inference. It achieves state-of-the-art or competitive performance on multiple remote sensing visual grounding benchmarks. This work delivers a practical, low-overhead, high-performance, and deployment-friendly solution for multimodal remote sensing understanding.

Technology Category

Application Category

πŸ“ Abstract
Foundation models have revolutionized artificial intelligence (AI), offering remarkable capabilities across multi-modal domains. Their ability to precisely locate objects in complex aerial and satellite images, using rich contextual information and detailed object descriptions, is essential for remote sensing (RS). These models can associate textual descriptions with object positions through the Visual Grounding (VG) task, but due to domain-specific challenges, their direct application to RS produces sub-optimal results. To address this, we applied Parameter Efficient Fine Tuning (PEFT) techniques to adapt these models for RS-specific VG tasks. Specifically, we evaluated LoRA placement across different modules in Grounding DINO and used BitFit and adapters to fine-tune the OFA foundation model pre-trained on general-purpose VG datasets. This approach achieved performance comparable to or surpassing current State Of The Art (SOTA) models while significantly reducing computational costs. This study highlights the potential of PEFT techniques to advance efficient and precise multi-modal analysis in RS, offering a practical and cost-effective alternative to full model training.
Problem

Research questions and friction points this paper is trying to address.

Adapt foundation models for remote sensing visual grounding
Improve object location accuracy in aerial images
Reduce computational costs with efficient fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Applied Parameter Efficient Fine Tuning (PEFT)
Evaluated LoRA placement in Grounding DINO
Used BitFit and adapters for OFA fine-tuning
πŸ”Ž Similar Papers
No similar papers found.
H
Hasan Moughnieh
National Center for Remote Sensing, CNRS, Beirut , Lebanon
M
Mohamad Chalhoub
Lebanese University, Beirut , Lebanon
H
Hasan Nasrallah
National Center for Remote Sensing, CNRS, Beirut , Lebanon
Cristiano Nattero
Cristiano Nattero
WASDI sΓ rl
Earth ObservationRemote SensingSatellite ImageryArtificial IntelligenceCloud computing
P
Paolo Campanella
WASDI, Dudelange, Luxembourg
Ali J. Ghandour
Ali J. Ghandour
National Council for Scientific Research (CNRS)
Earth ObservationGeomaticsGeospatial Smart CityTransportation