🤖 AI Summary
To address the challenge of achieving both behavioral adaptability and safety compliance in social navigation for mobile robots operating in dynamic human environments, this paper proposes an end-to-end policy learning framework integrating positive/negative example learning with rule-guided optimization. Methodologically: (1) a density-based reward function is designed to learn crowd interaction preferences from expert demonstrations; (2) hard safety constraints—such as minimum separation distance and collision-avoidance directionality—are explicitly encoded into both the reward shaping and objective function; (3) an uncertainty-aware teacher–student distillation mechanism, coupled with a sampling-based look-ahead controller, is employed to train a lightweight student policy. Evaluated on synthetic scenarios and elevator co-riding simulations, our approach improves navigation success rate by +18.7% and reduces average task completion time by 23.4%. Deployment feasibility is further validated on real robotic platforms. The core contribution lies in the synergistic modeling of data-driven flexibility and rule-enforced safety, realized through efficient distillation.
📝 Abstract
Mobile robot navigation in dynamic human environments requires policies that balance adaptability to diverse behaviors with compliance to safety constraints. We hypothesize that integrating data-driven rewards with rule-based objectives enables navigation policies to achieve a more effective balance of adaptability and safety. To this end, we develop a framework that learns a density-based reward from positive and negative demonstrations and augments it with rule-based objectives for obstacle avoidance and goal reaching. A sampling-based lookahead controller produces supervisory actions that are both safe and adaptive, which are subsequently distilled into a compact student policy suitable for real-time operation with uncertainty estimates. Experiments in synthetic and elevator co-boarding simulations show consistent gains in success rate and time efficiency over baselines, and real-world demonstrations with human participants confirm the practicality of deployment. A video illustrating this work can be found on our project page https://chanwookim971024.github.io/PioneeR/.