🤖 AI Summary
This work addresses the challenge of robust robotic navigation under severe depth sensor degradation caused by low illumination or reflective surfaces. To mitigate this issue, the authors propose a cross-modal reinforcement learning framework that leverages a cross-modal Wasserstein autoencoder to enforce consistency between depth maps and grayscale images in a shared latent space. This enables the system to infer depth-relevant features from grayscale imagery when depth data is unreliable or missing. Notably, this approach is the first to effectively integrate cross-modally aligned latent representations into navigation policies. Extensive experiments demonstrate significant improvements in navigation robustness under depth-degraded conditions, both in simulation and real-world environments, with successful zero-shot sim-to-real transfer achieved without additional fine-tuning.
📝 Abstract
This paper presents a cross-modal learning framework that exploits complementary information from depth and grayscale images for robust navigation. We introduce a Cross-Modal Wasserstein Autoencoder that learns shared latent representations by enforcing cross-modal consistency, enabling the system to infer depth-relevant features from grayscale observations when depth measurements are corrupted. The learned representations are integrated with a Reinforcement Learning-based policy for collision-free navigation in unstructured environments when depth sensors experience degradation due to adverse conditions such as poor lighting or reflective surfaces. Simulation and real-world experiments demonstrate that our approach maintains robust performance under significant depth degradation and successfully transfers to real environments.