๐ค AI Summary
Current navigation evaluation protocols are limited to single skills and specific robot morphologies, failing to capture the demands of multi-skill coordination and cross-morphology generalization in real-world scenarios. To address this gap, this work proposes OmniNavBench, a modular simulation benchmark featuring three key innovations: compositional task instructions, cross-morphology compatibility supporting humanoid, quadruped, and wheeled robots, and high-fidelity human demonstration trajectories collected via teleoperation. The benchmark encompasses 170 hybrid simulated-realistic environments and 1,779 high-quality trajectories. Experimental results reveal that existing methods perform substantially worse on compositional navigation tasks, highlighting a significant discrepancy between current capabilities and practical deployment requirements, thereby establishing OmniNavBench as a new standard for evaluating general-purpose navigation agents.
๐ Abstract
The pursuit of general-purpose embodied agents is hindered by fragmented evaluation protocols that isolate navigation skills and fixate on specific robot morphologies, failing to reflect real-world scenarios where agents must orchestrate diverse behaviors across varying embodiments. To bridge this gap, we introduce OmniNavBench, a benchmark for cross-skill coordination and cross-embodiment generalization. OmniNavBench introduces three paradigm shifts: (1) Compositional Complexity. We propose composite instructions that interleave sub-tasks from 6 categories (PointNav, VLN, ObjectNav, SocialNav, Human Following and EQA), compelling agents to transition between exploration, interaction, and social compliance within a single episode. (2) Morphological Universality and Sensor Flexibility. We present a simulation platform that breaks the reliance on single-morphology evaluation, enabling generalization tests across humanoid, quadrupedal, and wheeled robots, with a modular sensor interface and 170 environments blending synthetic assets with real-world scans. (3) Demonstrations Quality. Moving beyond shortest-path algorithms, we curate 1779 expert trajectories via human teleoperation, capturing behavioral nuances such as exploratory glance and anticipatory avoidance. Extensive evaluations demonstrate that current methods, despite their claimed unified design, struggle with the complex, interleaved nature of general-purpose navigation. This exposes a critical disparity between existing capabilities and real-world deployment demands, underscoring OmniNavBench as a testbed for the next generation of generalist navigators. Dataset, code, and leaderboard are available at http://omninavbench.cloud-ip.cc.