🤖 AI Summary
To address the need for socially compliant navigation of service robots in crowded environments, this paper proposes a pedestrian social-group-aware navigation framework. Existing approaches lack understanding of social relationships in scenarios such as queuing, conversing, or group photography. To overcome this, our framework introduces, for the first time, a vision-prompted large multimodal model (LMM) for zero-shot social relationship recognition—eliminating reliance on annotated data. We further design a mid-level planner that jointly optimizes global path efficiency and local real-time responsiveness, ensuring socially coherent behavior. Experiments in realistic social settings demonstrate that our method significantly reduces social disturbances—including group interruption and personal space intrusion—while preserving conventional navigation performance (success rate, path length, and execution time). The core contribution lies in the first integration of LMMs into social navigation, enabling annotation-free social relationship perception and hierarchical, social-aware motion planning.
📝 Abstract
With the increasing presence of service robots and autonomous vehicles in human environments, navigation systems need to evolve beyond simple destination reach to incorporate social awareness. This paper introduces GSON, a novel group-based social navigation framework that leverages Large Multimodal Models (LMMs) to enhance robots' social perception capabilities. Our approach uses visual prompting to enable zero-shot extraction of social relationships among pedestrians and integrates these results with robust pedestrian detection and tracking pipelines to overcome the inherent inference speed limitations of LMMs. The planning system incorporates a mid-level planner that sits between global path planning and local motion planning, effectively preserving both global context and reactive responsiveness while avoiding disruption of the predicted social group. We validate GSON through extensive real-world mobile robot navigation experiments involving complex social scenarios such as queuing, conversations, and photo sessions. Comparative results show that our system significantly outperforms existing navigation approaches in minimizing social perturbations while maintaining comparable performance on traditional navigation metrics.