HALO: Human Preference Aligned Offline Reward Learning for Robot Navigation

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of quantifying and modeling human intuition for robot visual navigation. We propose HALO, a visual reward learning framework that leverages offline human preference data. Its core innovation lies in formalizing human navigation intuition as a learnable visual reward function and jointly modeling action preferences—parameterized via a Boltzmann distribution—and binary user feedback using a Plackett–Luce ranking loss for effective reward shaping. HALO is compatible with both learned policies and classical planners, requiring neither online interaction nor environment resets. In real-world experiments, HALO achieves a 33.3% improvement in navigation success rate over baselines, reduces average trajectory length by 12.9%, and decreases the Fréchet distance between predicted and expert trajectory distributions by 26.6%. These results demonstrate substantially enhanced generalization across diverse environments and robotic platforms.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce HALO, a novel Offline Reward Learning algorithm that quantifies human intuition in navigation into a vision-based reward function for robot navigation. HALO learns a reward model from offline data, leveraging expert trajectories collected from mobile robots. During training, actions are uniformly sampled around a reference action and ranked using preference scores derived from a Boltzmann distribution centered on the preferred action, and shaped based on binary user feedback to intuitive navigation queries. The reward model is trained via the Plackett-Luce loss to align with these ranked preferences. To demonstrate the effectiveness of HALO, we deploy its reward model in two downstream applications: (i) an offline learned policy trained directly on the HALO-derived rewards, and (ii) a model-predictive-control (MPC) based planner that incorporates the HALO reward as an additional cost term. This showcases the versatility of HALO across both learning-based and classical navigation frameworks. Our real-world deployments on a Clearpath Husky across diverse scenarios demonstrate that policies trained with HALO generalize effectively to unseen environments and hardware setups not present in the training data. HALO outperforms state-of-the-art vision-based navigation methods, achieving at least a 33.3% improvement in success rate, a 12.9% reduction in normalized trajectory length, and a 26.6% reduction in Frechet distance compared to human expert trajectories.
Problem

Research questions and friction points this paper is trying to address.

Quantify human intuition into robot navigation rewards
Learn reward model from offline expert trajectory data
Improve vision-based navigation success and efficiency metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline reward learning from human preferences
Plackett-Luce loss for preference alignment
Deployment in learning and MPC frameworks
🔎 Similar Papers
No similar papers found.
Gershom Seneviratne
Gershom Seneviratne
University of Maryland
RoboticsMotion PlanningAutonomous NavigationPerception
J
Jianyu An
University of Maryland, College Park
S
Sahire Ellahy
University of Maryland, College Park
Kasun Weerakoon
Kasun Weerakoon
Ph.D., University of Maryland, College Park, MD
RoboticsDeep Reinforcement LearningAutonomous Navigation
M
Mohamed Bashir Elnoor
University of Maryland, College Park
J
Jonathan Deepak Kannan
University of Maryland, College Park
A
Amogha Thalihalla Sunil
University of Maryland, College Park
Dinesh Manocha
Dinesh Manocha
Distinguished University Professor, University of Maryland at College Park
computer graphicsgeometric modelingmotion planningvirtual realityrobotics