Enhancing Ground-to-Aerial Image Matching for Visual Misinformation Detection Using Semantic Segmentation

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

To address the challenging cross-view matching between GPS-unlabeled ground-level images and satellite imagery—particularly under multi-field-of-view (FoV) conditions where localization accuracy degrades and visual misinformation is difficult to trace—this paper proposes SAN-QUAD, a novel four-stream Siamese network. SAN-QUAD is the first method to incorporate semantic segmentation features for fine-grained semantic alignment between ground and satellite views, overcoming limitations of conventional approaches reliant solely on geometric constraints or appearance similarity. By jointly modeling scene structural semantics and viewpoint-invariant representations, it significantly enhances cross-modal matching robustness. Experiments on the CVUSA subset demonstrate a 9.8% improvement over state-of-the-art methods; notably, localization accuracy is substantially improved under multi-FoV settings. The approach establishes a new, interpretable, and high-precision geolocation paradigm with direct applicability to visual misinformation detection.

Technology Category

Application Category

📝 Abstract

The recent advancements in generative AI techniques, which have significantly increased the online dissemination of altered images and videos, have raised serious concerns about the credibility of digital media available on the Internet and distributed through information channels and social networks. This issue particularly affects domains that rely heavily on trustworthy data, such as journalism, forensic analysis, and Earth observation. To address these concerns, the ability to geolocate a non-geo-tagged ground-view image without external information, such as GPS coordinates, has become increasingly critical. This study tackles the challenge of linking a ground-view image, potentially exhibiting varying fields of view (FoV), to its corresponding satellite image without the aid of GPS data. To achieve this, we propose a novel four-stream Siamese-like architecture, the Quadruple Semantic Align Net (SAN-QUAD), which extends previous state-of-the-art (SOTA) approaches by leveraging semantic segmentation applied to both ground and satellite imagery. Experimental results on a subset of the CVUSA dataset demonstrate significant improvements of up to 9.8% over prior methods across various FoV settings.

Problem

Research questions and friction points this paper is trying to address.

Detecting altered digital media

Linking ground-view to satellite images

Improving geolocation without GPS data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Four-stream Siamese-like architecture

Semantic segmentation on imagery

Quadruple Semantic Align Net

🔎 Similar Papers

Exploring Saliency Bias in Manipulation Detection