🤖 AI Summary
Embodied navigation (EN) lacks systematic surveys and a unified theoretical framework, hindering the integration of perceptual, social, and motor intelligence for complex autonomous navigation. To address this, we propose TOFRA—a five-stage unified framework comprising State Transition, Observation, Information Fusion, Reward Construction, and Action Decision—marking the first effort to deeply embed social interaction and motor intelligence into EN. Leveraging first-person perception, multimodal sensor fusion, deep reinforcement learning, and human behavior imitation, we comprehensively survey state-of-the-art methods, evaluate mainstream simulation platforms and benchmark metrics, and release an open-source resource repository. Our work establishes standardized taxonomies and evaluation protocols, identifies key open challenges—including social-aware planning, long-horizon motor control, and cross-platform generalization—and provides a foundational benchmark and roadmap for both theoretical advancement and real-world deployment of EN systems.
📝 Abstract
Embodied navigation (EN) advances traditional navigation by enabling robots to perform complex egocentric tasks through sensing, social, and motion intelligence. In contrast to classic methodologies that rely on explicit localization and pre-defined maps, EN leverages egocentric perception and human-like interaction strategies. This survey introduces a comprehensive EN formulation structured into five stages: Transition, Observation, Fusion, Reward-policy construction, and Action (TOFRA). The TOFRA framework serves to synthesize the current state of the art, provide a critical review of relevant platforms and evaluation metrics, and identify critical open research challenges. A list of studies is available at https://github.com/Franky-X/Awesome-Embodied-Navigation.