Geo2Sound: A Scalable Geo-Aligned Framework for Soundscape Generation from Satellite Imagery

πŸ“… 2026-04-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

193K/year
πŸ€– AI Summary
This work addresses the challenge of generating spatially coherent and geographically realistic soundscapes from satellite imagery, which suffers from semantic ambiguity due to its top-down perspective and large coverage area. We introduce a novel task of satellite-to-soundscape generation and propose Geo2Sound, a unified framework that leverages a lightweight geographic attribute classifier to model spatial structure, produces multiple semantic soundscape hypotheses, and employs a geo-acoustic embedding alignment module to select the optimal output. To facilitate research in this domain, we also release SatSound-Bench, the first large-scale benchmark of paired satellite images and audio recordings. Experiments demonstrate that our method achieves a state-of-the-art FAD score of 1.765 on SatSound-Bench, representing a 50.0% improvement over the strongest baseline; human evaluations further confirm a 26.5% gain in realism and significantly enhanced semantic and geographic alignment.

Technology Category

Application Category

πŸ“ Abstract
Recent image-to-audio models have shown impressive performance on object-centric visual scenes. However, their application to satellite imagery remains limited by the complex, wide-area semantic ambiguity of top-down views. While satellite imagery provides a uniquely scalable source for global soundscape generation, matching these views to real acoustic environments with unique spatial structures is inherently difficult. To address this challenge, we introduce Geo2Sound, a novel task and framework for generating geographically realistic soundscapes from satellite imagery. Specifically, Geo2Sound combines structural geospatial attributes modeling, semantic hypothesis expansion, and geo-acoustic alignment in a unified framework. A lightweight classifier summarizes overhead scenes into compact geographic attributes, multiple sound-oriented semantic hypotheses are used to generate diverse acoustically plausible candidates, and a geo-acoustic alignment module projects geographic attributes into the acoustic embedding space and identifies the candidate most consistent with the candidate sets. Moreover, we establish SatSound-Bench, the first benchmark comprising over 20k high-quality paired satellite images, text descriptions, and real-world audio recordings, collected from the field across more than 10 countries and complemented by three public datasets. Experiments show that Geo2Sound achieves a SOTA FAD of 1.765, outperforming the strongest baseline by 50.0%. Human evaluations further confirm substantial gains in both realism (26.5%) and semantic alignment, validating our high-fidelity synthesis on scale. Project page and source code: https://github.com/Blanketzzz/Geo2Sound
Problem

Research questions and friction points this paper is trying to address.

soundscape generation
satellite imagery
geo-acoustic alignment
semantic ambiguity
geospatial attributes
Innovation

Methods, ideas, or system contributions that make the work stand out.

geo-acoustic alignment
soundscape generation
satellite imagery
semantic hypothesis expansion
geospatial attributes
πŸ”Ž Similar Papers
K
Kunlin Wu
The Hong Kong University of Science and Technology (Guangzhou), China
Y
Yanning Wang
The Hong Kong University of Science and Technology (Guangzhou), China
H
Haofeng Tan
University of South Carolina, USA
B
Boyi Chen
The Hong Kong University of Science and Technology (Guangzhou), China
Teng Fei
Teng Fei
School of Resources and Environmental Science, Wuhan University
Remote SensingGISSocial SensingPlanningNatural Resources
X
Xianping Ma
Southwest Jiaotong University, China
Y
Yang Yue
The Hong Kong University of Science and Technology (Guangzhou), China
Z
Zan Zhou
Beijing University of Posts and Telecommunications, China
Xiaofeng Liu
Xiaofeng Liu
Associate Investigator, School of Pharmacy, East China University of Science & Technology
Computer-aided Drug Design