Geodiffussr: Generative Terrain Texturing with Elevation Fidelity

📅 2025-11-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large-scale terrain texture generation faces the challenge of simultaneously achieving high visual fidelity and geographic consistency. This paper proposes the first text-guided, DEM-constrained 2.5D terrain texture synthesis method. Its core contribution is a Multi-scale Content Aggregation (MCA) mechanism that injects DEM features into multi-resolution UNet modules, enabling strong global-to-local coupling between elevation geometry and surface appearance. Built upon a flow-matching framework, the method integrates a pre-trained encoder with a multi-scale UNet architecture and leverages a large-scale, text-annotated training dataset derived from SRTM and Sentinel-2 remote sensing imagery. Quantitative evaluation shows significant improvements over baseline models: FID decreases by 49.16%, LPIPS drops by 32.33%, and height-appearance correlation error is reduced to only 0.0016. The approach achieves high-fidelity, controllable, and geographically consistent texture generation across diverse global terrains.

Technology Category

Application Category

📝 Abstract
Large-scale terrain generation remains a labor-intensive task in computer graphics. We introduce Geodiffussr, a flow-matching pipeline that synthesizes text-guided texture maps while strictly adhering to a supplied Digital Elevation Map (DEM). The core mechanism is multi-scale content aggregation (MCA): DEM features from a pretrained encoder are injected into UNet blocks at multiple resolutions to enforce global-to-local elevation consistency. Compared with a non-MCA baseline, MCA markedly improves visual fidelity and strengthens height-appearance coupling (FID $downarrow$ 49.16%, LPIPS $downarrow$ 32.33%, $Δ$dCor $downarrow$ to 0.0016). To train and evaluate Geodiffussr, we assemble a globally distributed, biome- and climate-stratified corpus of triplets pairing SRTM-derived DEMs with Sentinel-2 imagery and vision-grounded natural-language captions that describe visible land cover. We position Geodiffussr as a strong baseline and step toward controllable 2.5D landscape generation for coarse-scale ideation and previz, complementary to physically based terrain and ecosystem simulators.
Problem

Research questions and friction points this paper is trying to address.

Generates terrain textures guided by text and elevation maps
Ensures elevation consistency using multi-scale content aggregation
Creates a dataset for training and evaluating terrain texturing models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow-matching pipeline for DEM-consistent texture synthesis
Multi-scale content aggregation enforces elevation fidelity
Global dataset pairs DEMs, satellite imagery, and captions
🔎 Similar Papers
No similar papers found.
T
Tai Inui
Waseda University, Japan and Rikka Inc., Japan
A
Alexander Matsumura
Waseda University, Japan
Edgar Simo-Serra
Edgar Simo-Serra
Waseda University
Computer GraphicsMachine LearningComputer Vision