GraphGeo: Multi-Agent Debate Framework for Visual Geo-localization with Heterogeneous Graph Neural Networks

📅 2025-11-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visual geolocalization faces challenges including low localization accuracy for GPS-denied images, weak generalization of single-model approaches, and the absence of conflict-resolution mechanisms in multi-agent collaboration. To address these, we propose a multi-agent debate framework based on heterogeneous graph neural networks, introducing a novel two-level debate mechanism and cross-level topological optimization strategy. Our method explicitly models collaboration, competition, and knowledge transfer via typed edges, enabling co-evolution of graph structure and node representations. It integrates geographical reasoning capabilities from large vision-language models, incorporating node-level refinement, edge-level argument modeling, and cross-attention mechanisms. Extensive experiments demonstrate significant improvements over state-of-the-art methods across multiple benchmarks, with particularly pronounced gains in complex geographic scenes—validating both the effectiveness of transforming cognitive conflicts into localization performance and the framework’s strong generalization capability.

Technology Category

Application Category

📝 Abstract
Visual geo-localization requires extensive geographic knowledge and sophisticated reasoning to determine image locations without GPS metadata. Traditional retrieval methods are constrained by database coverage and quality. Recent Large Vision-Language Models (LVLMs) enable direct location reasoning from image content, yet individual models struggle with diverse geographic regions and complex scenes. Existing multi-agent systems improve performance through model collaboration but treat all agent interactions uniformly. They lack mechanisms to handle conflicting predictions effectively. We propose extbf{GraphGeo}, a multi-agent debate framework using heterogeneous graph neural networks for visual geo-localization. Our approach models diverse debate relationships through typed edges, distinguishing supportive collaboration, competitive argumentation, and knowledge transfer. We introduce a dual-level debate mechanism combining node-level refinement and edge-level argumentation modeling. A cross-level topology refinement strategy enables co-evolution between graph structure and agent representations. Experiments on multiple benchmarks demonstrate GraphGeo significantly outperforms state-of-the-art methods. Our framework transforms cognitive conflicts between agents into enhanced geo-localization accuracy through structured debate.
Problem

Research questions and friction points this paper is trying to address.

Traditional visual geo-localization methods are limited by database coverage constraints
Individual large vision-language models struggle with diverse geographic regions and scenes
Existing multi-agent systems lack mechanisms to handle conflicting predictions effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous graph neural networks model diverse debate relationships
Dual-level mechanism combines node refinement and edge argumentation
Cross-level topology refines graph structure and agent representations
🔎 Similar Papers
No similar papers found.
H
Heng Zheng
South China Normal University
Y
Yuling Shi
Shanghai Jiao Tong University
Xiaodong Gu
Xiaodong Gu
Associate Professor, Shanghai Jiao Tong University
Software EngineeringLarge Language Models
Haochen You
Haochen You
Columbia University
Generative AIMachine LearningStatistics
Z
Zijian Zhang
University of Pennsylvania
L
Lubin Gan
University of Science and Technology of China
H
Hao Zhang
University of Chinese Academy of Sciences
W
Wenjun Huang
Sun Yat-sen University
J
Jin Huang
South China Normal University