DVGBench: Implicit-to-Explicit Visual Grounding Benchmark in UAV Imagery with Large Vision-Language Models

📅 2026-01-02
🏛️ Isprs Journal of Photogrammetry and Remote Sensing
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing remote sensing visual grounding datasets predominantly rely on explicit referring expressions, which are ill-suited for implicit localization tasks requiring domain-specific knowledge. To address this gap, this work presents DVGBench—the first benchmark for implicit visual grounding in drone imagery—spanning six real-world application scenarios and introducing paired explicit–implicit queries. The authors propose DroneVG-R1, a model that integrates an Implicit-to-Explicit Chain-of-Thought (I2E-CoT) mechanism within a reinforcement learning framework to translate implicit references into actionable explicit expressions, thereby significantly enhancing localization performance. Experiments on DVGBench reveal that prevailing models struggle with implicit reasoning, whereas DroneVG-R1 achieves notably higher grounding accuracy.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

visual grounding
implicit referring expressions
UAV imagery
large vision-language models
remote sensing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit-to-Explicit Visual Grounding
Large Vision-Language Models
Chain-of-Thought Reasoning
UAV Imagery
Reinforcement Learning
🔎 Similar Papers
No similar papers found.
Yue Zhou
Yue Zhou
Associate Professor, East China Normal University
Remote Sensing Vision-Language ModelOriented Object Detection
J
Jue Chen
School of Geospatial Artificial Intelligence, East China Normal University, Shanghai, 200241, China
Z
Zilun Zhang
Zhejiang University, Hangzhou, 310058, China
P
Penghui Huang
Shanghai Jiao Tong University, Shanghai, 200240, China
R
Ran Ding
Shanghai Jiao Tong University, Shanghai, 200240, China
Z
Zhentao Zou
Shanghai Jiao Tong University, Shanghai, 200240, China
P
Pengfei Gao
Information Engineering University, Zhengzhou, 450001, China
Y
Yuchen Wei
Information Engineering University, Zhengzhou, 450001, China
K
Ke Li
Information Engineering University, Zhengzhou, 450001, China
X
Xue Yang
Shanghai Jiao Tong University, Shanghai, 200240, China
X
Xue Jiang
Shanghai Jiao Tong University, Shanghai, 200240, China
H
Hongxin Yang
Hinton STAI Institute and Key Laboratory of Geographic Information Science (Ministry of Education), East China Normal University, Shanghai, 200241, China
J
Jonathan Li
Hinton STAI Institute and Key Laboratory of Geographic Information Science (Ministry of Education), East China Normal University, Shanghai, 200241, China