🤖 AI Summary
Existing Embodied Question Answering (EQA) benchmarks focus predominantly on domestic environments and lack evaluation of safety-critical industrial tasks and fine-grained reasoning capabilities. Method: We introduce Industrial-EQA, the first EQA benchmark tailored to industrial warehouse settings, targeting six cognitive dimensions—including equipment/personal safety and spatiotemporal reasoning. We extend Embodied Question Benchmarking (EQB) to high-risk industrial contexts by incorporating safety-regulation-driven dynamic hazardous scenarios, multi-agent interaction videos, and a hierarchical reasoning evaluation framework. Leveraging NVIDIA Isaac Sim, we build a high-fidelity simulation platform with multimodal video annotation, structured QA generation, and an interpretable reasoning assessment pipeline. Contribution/Results: We release 1,344 high-quality QA samples—covering both human-operated and autonomous modalities—and open-source the dataset, evaluation code, and baseline model analyses, significantly advancing safety-awareness and interpretability evaluation for industrial embodied agents.
📝 Abstract
Existing Embodied Question Answering (EQA) benchmarks primarily focus on household environments, often overlooking safety-critical aspects and reasoning processes pertinent to industrial settings. This drawback limits the evaluation of agent readiness for real-world industrial applications. To bridge this, we introduce IndustryEQA, the first benchmark dedicated to evaluating embodied agent capabilities within safety-critical warehouse scenarios. Built upon the NVIDIA Isaac Sim platform, IndustryEQA provides high-fidelity episodic memory videos featuring diverse industrial assets, dynamic human agents, and carefully designed hazardous situations inspired by real-world safety guidelines. The benchmark includes rich annotations covering six categories: equipment safety, human safety, object recognition, attribute recognition, temporal understanding, and spatial understanding. Besides, it also provides extra reasoning evaluation based on these categories. Specifically, it comprises 971 question-answer pairs generated from small warehouse and 373 pairs from large ones, incorporating scenarios with and without human. We further propose a comprehensive evaluation framework, including various baseline models, to assess their general perception and reasoning abilities in industrial environments. IndustryEQA aims to steer EQA research towards developing more robust, safety-aware, and practically applicable embodied agents for complex industrial environments. Benchmark and codes are available.