🤖 AI Summary
Large-scale terrain texture generation faces the challenge of simultaneously achieving high visual fidelity and geographic consistency. This paper proposes the first text-guided, DEM-constrained 2.5D terrain texture synthesis method. Its core contribution is a Multi-scale Content Aggregation (MCA) mechanism that injects DEM features into multi-resolution UNet modules, enabling strong global-to-local coupling between elevation geometry and surface appearance. Built upon a flow-matching framework, the method integrates a pre-trained encoder with a multi-scale UNet architecture and leverages a large-scale, text-annotated training dataset derived from SRTM and Sentinel-2 remote sensing imagery. Quantitative evaluation shows significant improvements over baseline models: FID decreases by 49.16%, LPIPS drops by 32.33%, and height-appearance correlation error is reduced to only 0.0016. The approach achieves high-fidelity, controllable, and geographically consistent texture generation across diverse global terrains.
📝 Abstract
Large-scale terrain generation remains a labor-intensive task in computer graphics. We introduce Geodiffussr, a flow-matching pipeline that synthesizes text-guided texture maps while strictly adhering to a supplied Digital Elevation Map (DEM). The core mechanism is multi-scale content aggregation (MCA): DEM features from a pretrained encoder are injected into UNet blocks at multiple resolutions to enforce global-to-local elevation consistency. Compared with a non-MCA baseline, MCA markedly improves visual fidelity and strengthens height-appearance coupling (FID $downarrow$ 49.16%, LPIPS $downarrow$ 32.33%, $Δ$dCor $downarrow$ to 0.0016). To train and evaluate Geodiffussr, we assemble a globally distributed, biome- and climate-stratified corpus of triplets pairing SRTM-derived DEMs with Sentinel-2 imagery and vision-grounded natural-language captions that describe visible land cover. We position Geodiffussr as a strong baseline and step toward controllable 2.5D landscape generation for coarse-scale ideation and previz, complementary to physically based terrain and ecosystem simulators.