IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing embodied agents exhibit severe deficiencies in spatial reasoning and safe navigation within dynamic industrial environments; mainstream benchmarks are confined to static domestic settings, failing to evaluate global-local coordinated planning and proactive interaction capabilities. Method: We introduce the first active spatial reasoning navigation benchmark tailored to dynamic industrial settings, built upon 12 high-fidelity Unity warehouse scenes featuring mobile obstacles, human activities, and realistic sensor simulation. We propose a dual safety metric—collision rate and warning rate—and adopt an end-to-end PointGoal framework integrating egocentric vision with global odometry for evaluation. Contribution/Results: A systematic assessment of nine state-of-the-art vision-language large models (VLLMs) reveals that while closed-source models slightly outperform open-source ones, all exhibit critical weaknesses in path-planning robustness, dynamic obstacle avoidance, and active exploration—highlighting fundamental bottlenecks in real-world embodied intelligence development.

Technology Category

Application Category

📝 Abstract
While Visual Large Language Models (VLLMs) show great promise as embodied agents, they continue to face substantial challenges in spatial reasoning. Existing embodied benchmarks largely focus on passive, static household environments and evaluate only isolated capabilities, failing to capture holistic performance in dynamic, real-world complexity. To fill this gap, we present IndustryNav, the first dynamic industrial navigation benchmark for active spatial reasoning. IndustryNav leverages 12 manually created, high-fidelity Unity warehouse scenarios featuring dynamic objects and human movement. Our evaluation employs a PointGoal navigation pipeline that effectively combines egocentric vision with global odometry to assess holistic local-global planning. Crucially, we introduce the "collision rate" and "warning rate" metrics to measure safety-oriented behaviors and distance estimation. A comprehensive study of nine state-of-the-art VLLMs (including models such as GPT-5-mini, Claude-4.5, and Gemini-2.5) reveals that closed-source models maintain a consistent advantage; however, all agents exhibit notable deficiencies in robust path planning, collision avoidance and active exploration. This highlights a critical need for embodied research to move beyond passive perception and toward tasks that demand stable planning, active exploration, and safe behavior in dynamic, real-world environment.
Problem

Research questions and friction points this paper is trying to address.

Evaluating embodied agents' spatial reasoning in dynamic industrial environments
Assessing holistic navigation performance using safety-oriented collision metrics
Identifying deficiencies in path planning and collision avoidance capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic warehouse scenarios with Unity simulation
PointGoal navigation combining vision and odometry
Safety metrics collision rate and warning rate
🔎 Similar Papers
No similar papers found.