GraphGeo: Multi-Agent Debate Framework for Visual Geo-localization with Heterogeneous Graph Neural Networks

📅 2025-11-02

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Visual geolocalization faces challenges including low localization accuracy for GPS-denied images, weak generalization of single-model approaches, and the absence of conflict-resolution mechanisms in multi-agent collaboration. To address these, we propose a multi-agent debate framework based on heterogeneous graph neural networks, introducing a novel two-level debate mechanism and cross-level topological optimization strategy. Our method explicitly models collaboration, competition, and knowledge transfer via typed edges, enabling co-evolution of graph structure and node representations. It integrates geographical reasoning capabilities from large vision-language models, incorporating node-level refinement, edge-level argument modeling, and cross-attention mechanisms. Extensive experiments demonstrate significant improvements over state-of-the-art methods across multiple benchmarks, with particularly pronounced gains in complex geographic scenes—validating both the effectiveness of transforming cognitive conflicts into localization performance and the framework’s strong generalization capability.

Technology Category

Application Category

📝 Abstract

Visual geo-localization requires extensive geographic knowledge and sophisticated reasoning to determine image locations without GPS metadata. Traditional retrieval methods are constrained by database coverage and quality. Recent Large Vision-Language Models (LVLMs) enable direct location reasoning from image content, yet individual models struggle with diverse geographic regions and complex scenes. Existing multi-agent systems improve performance through model collaboration but treat all agent interactions uniformly. They lack mechanisms to handle conflicting predictions effectively. We propose extbf{GraphGeo}, a multi-agent debate framework using heterogeneous graph neural networks for visual geo-localization. Our approach models diverse debate relationships through typed edges, distinguishing supportive collaboration, competitive argumentation, and knowledge transfer. We introduce a dual-level debate mechanism combining node-level refinement and edge-level argumentation modeling. A cross-level topology refinement strategy enables co-evolution between graph structure and agent representations. Experiments on multiple benchmarks demonstrate GraphGeo significantly outperforms state-of-the-art methods. Our framework transforms cognitive conflicts between agents into enhanced geo-localization accuracy through structured debate.

Problem

Research questions and friction points this paper is trying to address.

Traditional visual geo-localization methods are limited by database coverage constraints

Individual large vision-language models struggle with diverse geographic regions and scenes

Existing multi-agent systems lack mechanisms to handle conflicting predictions effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous graph neural networks model diverse debate relationships

Dual-level mechanism combines node refinement and edge argumentation

Cross-level topology refines graph structure and agent representations

🔎 Similar Papers

GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model