🤖 AI Summary
To address the poor out-of-distribution (OOD) generalization and high collision rates of pure RGB-based visual navigation models, this paper proposes CARE—a plug-and-play safety enhancement module. CARE requires no fine-tuning of pretrained models and operates without depth sensors, leveraging monocular depth estimation to generate physics-inspired repulsive force vectors for real-time, trajectory-level re-planning. Its core innovation lies in achieving universal visual navigation safety enhancement with zero fine-tuning and zero hardware modification. Experiments across multiple ROS-based robotic platforms demonstrate that CARE reduces collision rates by up to 100%, increases collision-free traversal distance in exploration tasks by 10.7×, and preserves original goal-reaching performance—without compromising navigation accuracy or efficiency.
📝 Abstract
We propose CARE (Collision Avoidance via Repulsive Estimation), a plug-and-play module that enhances the safety of vision-based navigation without requiring additional range sensors or fine-tuning of pretrained models. While recent foundation models using only RGB inputs have shown strong performance, they often fail to generalize in out-of-distribution (OOD) environments with unseen objects or variations in camera parameters (e.g., field of view, pose, or focal length). Without fine-tuning, these models may generate unsafe trajectories that lead to collisions, requiring costly data collection and retraining. CARE addresses this limitation by seamlessly integrating with any RGB-based navigation system that outputs local trajectories, dynamically adjusting them using repulsive force vectors derived from monocular depth maps. We evaluate CARE by combining it with state-of-the-art vision-based navigation models across multiple robot platforms. CARE consistently reduces collision rates (up to 100%) without sacrificing goal-reaching performance and improves collision-free travel distance by up to 10.7x in exploration tasks.