IndustryEQA: Pushing the Frontiers of Embodied Question Answering in Industrial Scenarios

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing Embodied Question Answering (EQA) benchmarks focus predominantly on domestic environments and lack evaluation of safety-critical industrial tasks and fine-grained reasoning capabilities. Method: We introduce Industrial-EQA, the first EQA benchmark tailored to industrial warehouse settings, targeting six cognitive dimensions—including equipment/personal safety and spatiotemporal reasoning. We extend Embodied Question Benchmarking (EQB) to high-risk industrial contexts by incorporating safety-regulation-driven dynamic hazardous scenarios, multi-agent interaction videos, and a hierarchical reasoning evaluation framework. Leveraging NVIDIA Isaac Sim, we build a high-fidelity simulation platform with multimodal video annotation, structured QA generation, and an interpretable reasoning assessment pipeline. Contribution/Results: We release 1,344 high-quality QA samples—covering both human-operated and autonomous modalities—and open-source the dataset, evaluation code, and baseline model analyses, significantly advancing safety-awareness and interpretability evaluation for industrial embodied agents.

Technology Category

Application Category

📝 Abstract

Existing Embodied Question Answering (EQA) benchmarks primarily focus on household environments, often overlooking safety-critical aspects and reasoning processes pertinent to industrial settings. This drawback limits the evaluation of agent readiness for real-world industrial applications. To bridge this, we introduce IndustryEQA, the first benchmark dedicated to evaluating embodied agent capabilities within safety-critical warehouse scenarios. Built upon the NVIDIA Isaac Sim platform, IndustryEQA provides high-fidelity episodic memory videos featuring diverse industrial assets, dynamic human agents, and carefully designed hazardous situations inspired by real-world safety guidelines. The benchmark includes rich annotations covering six categories: equipment safety, human safety, object recognition, attribute recognition, temporal understanding, and spatial understanding. Besides, it also provides extra reasoning evaluation based on these categories. Specifically, it comprises 971 question-answer pairs generated from small warehouse and 373 pairs from large ones, incorporating scenarios with and without human. We further propose a comprehensive evaluation framework, including various baseline models, to assess their general perception and reasoning abilities in industrial environments. IndustryEQA aims to steer EQA research towards developing more robust, safety-aware, and practically applicable embodied agents for complex industrial environments. Benchmark and codes are available.

Problem

Research questions and friction points this paper is trying to address.

Evaluating embodied agents in industrial safety-critical scenarios

Addressing gaps in existing EQA benchmarks for industrial settings

Providing high-fidelity warehouse simulations with safety annotations

Innovation

Methods, ideas, or system contributions that make the work stand out.

IndustryEQA benchmark for warehouse safety scenarios

High-fidelity episodic memory videos with Isaac Sim

Comprehensive evaluation framework for industrial perception

🔎 Similar Papers

S-EQA: Tackling Situational Queries in Embodied Question Answering