Learning Social Navigation from Positive and Negative Demonstrations and Rule-Based Specifications

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

To address the challenge of achieving both behavioral adaptability and safety compliance in social navigation for mobile robots operating in dynamic human environments, this paper proposes an end-to-end policy learning framework integrating positive/negative example learning with rule-guided optimization. Methodologically: (1) a density-based reward function is designed to learn crowd interaction preferences from expert demonstrations; (2) hard safety constraints—such as minimum separation distance and collision-avoidance directionality—are explicitly encoded into both the reward shaping and objective function; (3) an uncertainty-aware teacher–student distillation mechanism, coupled with a sampling-based look-ahead controller, is employed to train a lightweight student policy. Evaluated on synthetic scenarios and elevator co-riding simulations, our approach improves navigation success rate by +18.7% and reduces average task completion time by 23.4%. Deployment feasibility is further validated on real robotic platforms. The core contribution lies in the synergistic modeling of data-driven flexibility and rule-enforced safety, realized through efficient distillation.

Technology Category

Application Category

📝 Abstract

Mobile robot navigation in dynamic human environments requires policies that balance adaptability to diverse behaviors with compliance to safety constraints. We hypothesize that integrating data-driven rewards with rule-based objectives enables navigation policies to achieve a more effective balance of adaptability and safety. To this end, we develop a framework that learns a density-based reward from positive and negative demonstrations and augments it with rule-based objectives for obstacle avoidance and goal reaching. A sampling-based lookahead controller produces supervisory actions that are both safe and adaptive, which are subsequently distilled into a compact student policy suitable for real-time operation with uncertainty estimates. Experiments in synthetic and elevator co-boarding simulations show consistent gains in success rate and time efficiency over baselines, and real-world demonstrations with human participants confirm the practicality of deployment. A video illustrating this work can be found on our project page https://chanwookim971024.github.io/PioneeR/.

Problem

Research questions and friction points this paper is trying to address.

Learning robot navigation from demonstrations and rule-based safety specifications

Balancing adaptability and safety in dynamic human environments

Developing real-time policies with uncertainty estimates for deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning rewards from positive and negative demonstrations

Augmenting rewards with rule-based safety objectives

Distilling safe actions into real-time student policy

🔎 Similar Papers

Online Context Learning for Socially-compliant Navigation