Martian World Models: Controllable Video Synthesis with Physically Accurate 3D Reconstructions

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

To address the scarcity of high-quality video data from Mars and the severe domain shift between Earth and Mars imagery, this paper proposes the M3arsSynth multimodal data generation pipeline and the MarsGen conditional video generation model. Methodologically, it integrates NASA PDS stereo navigation images for geometrically consistent 3D reconstruction, combines physics-informed terrain modeling with diffusion model fine-tuning, and enables controllable video synthesis guided by text prompts and robotic trajectories. The key contributions are: (1) the first video generation framework leveraging authentic Mars stereo data to produce high-fidelity, meter-resolution, replayable videos with geometrically consistent 3D structure; and (2) superior performance over Earth-pretrained models, maintaining both visual realism and geometric accuracy across diverse Martian terrains and illumination conditions—establishing a reliable visual foundation for mission rehearsal and robotic simulation.

Technology Category

Application Category

📝 Abstract

Synthesizing realistic Martian landscape videos is crucial for mission rehearsal and robotic simulation. However, this task poses unique challenges due to the scarcity of high-quality Martian data and the significant domain gap between Martian and terrestrial imagery. To address these challenges, we propose a holistic solution composed of two key components: 1) A data curation pipeline Multimodal Mars Synthesis (M3arsSynth), which reconstructs 3D Martian environments from real stereo navigation images, sourced from NASA's Planetary Data System (PDS), and renders high-fidelity multiview 3D video sequences. 2) A Martian terrain video generator, MarsGen, which synthesizes novel videos visually realistic and geometrically consistent with the 3D structure encoded in the data. Our M3arsSynth engine spans a wide range of Martian terrains and acquisition dates, enabling the generation of physically accurate 3D surface models at metric-scale resolution. MarsGen, fine-tuned on M3arsSynth data, synthesizes videos conditioned on an initial image frame and, optionally, camera trajectories or textual prompts, allowing for video generation in novel environments. Experimental results show that our approach outperforms video synthesis models trained on terrestrial datasets, achieving superior visual fidelity and 3D structural consistency.

Problem

Research questions and friction points this paper is trying to address.

Synthesizing realistic Martian landscape videos for mission rehearsal

Addressing scarcity of high-quality Martian data and domain gap

Generating physically accurate 3D Martian terrain video sequences

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Martian environment reconstruction from stereo images

High-fidelity multiview 3D video sequence rendering

Text-conditioned video generation in novel environments

🔎 Similar Papers

No similar papers found.