π€ AI Summary
This work addresses the data scarcity and generalization bottlenecks in goal-oriented visual navigation caused by reliance on large-scale, high-quality human demonstrations. To this end, we propose LiMoβa data-efficient Transformer-based navigation policy that integrates synthetic SE(2) trajectories generated by a geometric planner with a small set of human demonstrations. By modeling goal-conditioned navigation from a single RGB image, LiMo strategically blends diverse data sources instead of scaling up expert demonstrations, thereby significantly enhancing generalization. Real-robot experiments demonstrate that LiMo achieves superior navigation performance under limited data regimes, validating that data quality and diversity are more critical than sheer data volume for effective policy learning.
π Abstract
Imitation learning provides a powerful framework for goal-conditioned visual navigation in mobile robots, enabling obstacle avoidance while respecting human preferences and social norms. However, its effectiveness depends critically on the quality and diversity of training data. In this work, we show how classical geometric planners can be leveraged to generate synthetic trajectories that complement costly human demonstrations. We train Less is More (LiMo), a transformer-based visual navigation policy that predicts goal-conditioned SE(2) trajectories from a single RGB observation, and find that augmenting limited expert demonstrations with planner-generated supervision yields substantial performance gains. Through ablations and complementary qualitative and quantitative analyses, we characterize how dataset scale and diversity affect planning performance. We demonstrate real-robot deployment and argue that robust visual navigation is enabled not by simply collecting more demonstrations, but by strategically curating diverse, high-quality datasets. Our results suggest that scalable, embodiment-specific geometric supervision is a practical path toward data-efficient visual navigation.