🤖 AI Summary
This work addresses the severe domain shift faced by terrestrial monocular depth estimation algorithms when deployed in lunar environments, characterized by extreme shadows, textureless regolith, and the absence of atmospheric scattering, compounded by a longstanding scarcity of metric-scale ground truth for evaluation. To bridge this gap, we propose LuMon, a comprehensive benchmarking framework tailored for lunar surface perception. Leveraging high-fidelity depth maps derived from stereo imagery of the Chang’e-3 mission as real-world ground truth, and integrating the CHERI low-illumination analog dataset with synthetic data, LuMon establishes a unified multi-source benchmark. Through systematic zero-shot evaluation of state-of-the-art models and the establishment of simulation-to-reality domain adaptation baselines, our study exposes fundamental limitations in cross-domain generalization and lays a standardized foundation for extraterrestrial visual perception research.
📝 Abstract
Monocular Depth Estimation (MDE) is crucial for autonomous lunar rover navigation using electro-optical cameras. However, deploying terrestrial MDE networks to the Moon brings a severe domain gap due to harsh shadows, textureless regolith, and zero atmospheric scattering. Existing evaluations rely on analogs that fail to replicate these conditions and lack actual metric ground truth. To address this, we present LuMon, a comprehensive benchmarking framework to evaluate MDE methods for lunar exploration. We introduce novel datasets featuring high-quality stereo ground truth depth from the real Chang'e-3 mission and the CHERI dark analog dataset. Utilizing this framework, we conduct a systematic zero-shot evaluation of state-of-the-art architectures across synthetic, analog, and real datasets. We rigorously assess performance against mission critical challenges like craters, rocks, extreme shading, and varying depth ranges. Furthermore, we establish a sim-to-real domain adaptation baseline by fine tuning a foundation model on synthetic data. While this adaptation yields drastic in-domain performance gains, it exhibits minimal generalization to authentic lunar imagery, highlighting a persistent cross-domain transfer gap. Our extensive analysis reveals the inherent limitations of current networks and sets a standard foundation to guide future advancements in extraterrestrial perception and domain adaptation.