๐ค AI Summary
Existing generative AI struggles to maintain spatial consistency at scales of thousands of kilometers, limiting its ability to model the structure and evolution of large-scale geographic environments. This work proposes a spatially scalable generative modeling paradigm that, for the first time, treats spatial scale as a core expansion dimension of foundation models, transcending conventional reliance on parameter count and data volume alone. Leveraging tens of millions of global remote sensing images, we construct the first planetary-scale 3D generative foundation model capable of boundary-free, multi-resolution, and diverse synthesisโfrom continental landforms down to street-level scenes. The generated outputs exhibit both visual realism and geostatistical fidelity, offering a virtual environmental data engine for ultra-large-scale spatial intelligence and enabling next-generation Earth observation applications.
๐ Abstract
Recent generative AI models have achieved remarkable breakthroughs in language and visual understanding. However, although these models can generate realistic visual content, their spatial scale remains confined to bounded environments, preventing them from capturing how geographic environments evolve across thousands of kilometers or from modeling the spatial structure of the large-scale physical world. This limitation poses a critical challenge for ultra-wide-area spatial intelligence in Earth observation and simulation, revealing a deeper gap in generative AI: progress has relied primarily on scaling model parameters and training data, while overlooking spatial scale as a core dimension of intelligence. Here, motivated by this missing dimension, we investigate spatial scale as a new scaling axis in foundation models and present MetaEarth3D, the first generative foundation model capable of spatially consistent generation at the planetary scale. Taking optical Earth observation simulation as a testbed, MetaEarth3D enables the generation of multi-level, unbounded, and diverse 3D scenes spanning large-scale terrains, medium-scale cities, and fine-grained street blocks. Built upon 10 million globally distributed real-world training images, MetaEarth3D demonstrates both strong visual realism and geospatial statistical realism. Beyond generation, MetaEarth3D serves as a generative data engine for diverse virtual environments in ultra-wide spatial intelligence. We argue that this study may help empower next-generation spatial intelligence for Earth observation.