🤖 AI Summary
Visual geolocalization faces challenges including low localization accuracy for GPS-denied images, weak generalization of single-model approaches, and the absence of conflict-resolution mechanisms in multi-agent collaboration. To address these, we propose a multi-agent debate framework based on heterogeneous graph neural networks, introducing a novel two-level debate mechanism and cross-level topological optimization strategy. Our method explicitly models collaboration, competition, and knowledge transfer via typed edges, enabling co-evolution of graph structure and node representations. It integrates geographical reasoning capabilities from large vision-language models, incorporating node-level refinement, edge-level argument modeling, and cross-attention mechanisms. Extensive experiments demonstrate significant improvements over state-of-the-art methods across multiple benchmarks, with particularly pronounced gains in complex geographic scenes—validating both the effectiveness of transforming cognitive conflicts into localization performance and the framework’s strong generalization capability.
📝 Abstract
Visual geo-localization requires extensive geographic knowledge and sophisticated reasoning to determine image locations without GPS metadata. Traditional retrieval methods are constrained by database coverage and quality. Recent Large Vision-Language Models (LVLMs) enable direct location reasoning from image content, yet individual models struggle with diverse geographic regions and complex scenes. Existing multi-agent systems improve performance through model collaboration but treat all agent interactions uniformly. They lack mechanisms to handle conflicting predictions effectively. We propose extbf{GraphGeo}, a multi-agent debate framework using heterogeneous graph neural networks for visual geo-localization. Our approach models diverse debate relationships through typed edges, distinguishing supportive collaboration, competitive argumentation, and knowledge transfer. We introduce a dual-level debate mechanism combining node-level refinement and edge-level argumentation modeling. A cross-level topology refinement strategy enables co-evolution between graph structure and agent representations. Experiments on multiple benchmarks demonstrate GraphGeo significantly outperforms state-of-the-art methods. Our framework transforms cognitive conflicts between agents into enhanced geo-localization accuracy through structured debate.