Disentangling Uncertainty for Safe Social Navigation using Deep Reinforcement Learning

📅 2024-09-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In dense pedestrian scenarios, deep reinforcement learning (DRL)-based navigation policies exhibit unexplained uncertainty under unknown disturbances, leading to collisions or socially inappropriate behaviors. Method: This paper proposes the first framework that explicitly decouples aleatoric, epistemic, and predictive uncertainties. We introduce an enhanced Proximal Policy Optimization (PPO) architecture integrating observation-dependent variance (ODV) estimation and Monte Carlo Dropout, establishing an explicit mapping between uncertainty types and environmental disturbances. A deep ensemble enables policy-distribution-level uncertainty quantification, coupled with a conservative action selection mechanism to ensure social compliance while enhancing safety. Results: Experiments demonstrate significantly improved training stability, superior generalization to unseen disturbances, more precise and source-discriminative disturbance responses via MC-Dropout, and a substantial reduction in collision rates—all while preserving social acceptability.

Technology Category

Application Category

📝 Abstract
Autonomous mobile robots are increasingly used in pedestrian-rich environments where safe navigation and appropriate human interaction are crucial. While Deep Reinforcement Learning (DRL) enables socially integrated robot behavior, challenges persist in novel or perturbed scenarios to indicate when and why the policy is uncertain. Unknown uncertainty in decision-making can lead to collisions or human discomfort and is one reason why safe and risk-aware navigation is still an open problem. This work introduces a novel approach that integrates aleatoric, epistemic, and predictive uncertainty estimation into a DRL navigation framework for policy distribution uncertainty estimates. We, therefore, incorporate Observation-Dependent Variance (ODV) and dropout into the Proximal Policy Optimization (PPO) algorithm. For different types of perturbations, we compare the ability of deep ensembles and Monte-Carlo dropout (MC-dropout) to estimate the uncertainties of the policy. In uncertain decision-making situations, we propose to change the robot's social behavior to conservative collision avoidance. The results show improved training performance with ODV and dropout in PPO and reveal that the training scenario has an impact on the generalization. In addition, MC-dropout is more sensitive to perturbations and correlates the uncertainty type to the perturbation better. With the safe action selection, the robot can navigate in perturbed environments with fewer collisions.
Problem

Research questions and friction points this paper is trying to address.

Safe navigation in pedestrian-rich environments using DRL.
Estimating policy uncertainty in novel or perturbed scenarios.
Improving robot social behavior for collision avoidance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates aleatoric, epistemic, predictive uncertainty estimation
Uses Observation-Dependent Variance and dropout in PPO
Proposes conservative collision avoidance in uncertain scenarios
🔎 Similar Papers
No similar papers found.
Daniel Flögel
Daniel Flögel
Research Scientist, Karlsruhe Institute of Technology
Reinforcement LearningMotion PlanningHuman-Machine InteractionControl Theory
M
Marcos G'omez Villafane
FZI Research Center for Information Technology, Karlsruhe, Germany; Facultad de Ingeniería, Universidad de Buenos Aires, Buenos Aires, Argentina
Joshua Ransiek
Joshua Ransiek
Research Scientist, FZI Forschungszentrum Informatik
System TestingReinforcement LearningMotion Planning
S
Sören Hohmann
Institute of Control Systems at Karlsruhe Institute of Technology, Karlsruhe, Germany