NavThinker: Action-Conditioned World Models for Coupled Prediction and Planning in Social Navigation

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the coupled prediction-planning challenge in robot navigation within dynamic human environments by proposing the NavThinker framework. NavThinker leverages an action-conditioned world model to autoregressively predict future scene geometry and pedestrian trajectories in the feature space of Depth Anything V2, enabling joint inference through a multi-head decoder. Integrated with online reinforcement learning via DD-PPO and shaped social rewards, the framework achieves proactive social navigation. Notably, this is the first approach to synergistically combine action-conditioned world models with reinforcement learning for social navigation, attaining state-of-the-art navigation success rates on Social-HM3D, demonstrating zero-shot transfer to Social-MP3D, and validating its generalization and practicality through successful deployment on the Unitree Go2 quadruped robot.

Technology Category

Application Category

📝 Abstract

Social navigation requires robots to act safely in dynamic human environments. Effective behavior demands thinking ahead: reasoning about how the scene and pedestrians evolve under different robot actions rather than reacting to current observations alone. This creates a coupled prediction-planning challenge, where robot actions and human motion mutually influence each other. To address this challenge, we propose NavThinker, a future-aware framework that couples an action-conditioned world model with on-policy reinforcement learning. The world model operates in the Depth Anything V2 patch feature space and performs autoregressive prediction of future scene geometry and human motion; multi-head decoders then produce future depth maps and human trajectories, yielding a future-aware state aligned with traversability and interaction risk. Crucially, we train the policy with DD-PPO while injecting world-model think-ahead signals via: (i) action-conditioned future features fused into the current observation embedding and (ii) social reward shaping from predicted human trajectories. Experiments on single- and multi-robot Social-HM3D show state-of-the-art navigation success, with zero-shot transfer to Social-MP3D and real-world deployment on a Unitree Go2, validating generalization and practical applicability. Webpage: https://github.com/hutslib/NavThinker.

Problem

Research questions and friction points this paper is trying to address.

social navigation

coupled prediction-planning

action-conditioned world models

human-robot interaction

future-aware reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

action-conditioned world model

coupled prediction-planning

future-aware navigation