RobotEQ: Transitioning from Passive Intelligence to Active Intelligence in Embodied AI

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
This work addresses the limitation of current embodied AI systems, which predominantly rely on explicit instructions and struggle to proactively understand and adhere to social norms in instruction-free settings. To bridge this gap, the study introduces the novel concept of “proactive intelligence” and presents RobotEQ, the first benchmark for evaluating such capabilities. RobotEQ comprises 1,900 first-person images, 5,353 behavioral judgment queries, and 1,286 spatial grounding questions, all meticulously annotated by humans and augmented with an external social-norm knowledge base integrated via Retrieval-Augmented Generation (RAG). Experimental results reveal that existing models perform poorly on proactive intelligence tasks—particularly in spatial reasoning—and demonstrate that incorporating RAG significantly enhances overall performance, thereby advancing embodied AI toward socially compliant, proactive behavior beyond passive instruction following.
📝 Abstract
Embodied AI is a prominent research topic in both academia and industry. Current research centers on completing tasks based on explicit user instructions. However, for robots to integrate into human society, they must understand which actions are permissible and which are prohibited, even without explicit commands. We refer to the user-guided AI as passive intelligence and the unguided AI as active intelligence. This paper introduces RobotEQ, the first benchmark for active intelligence, aiming to assess whether existing models can comprehend and adhere to social norms in embodied scenarios. First, we construct RobotEQ-Data, a dataset consisting of 1,900 egocentric images, spanning 10 representative embodied categories and 56 subcategories. Through extensive manual annotation, we provide 5,353 action judgment questions and 1,286 spatial grounding questions, specifying appropriate robot actions across diverse scenarios. Furthermore, we establish RobotEQ-Bench to evaluate the performance of state-of-the-art models on this task. Experimental results show that current models still fall short in achieving reliable active intelligence, particularly in spatial grounding. Meanwhile, we observe that leveraging RAG techniques to incorporate external social norm knowledge bases can generally enhance performance. This work can facilitate the transition of robotics from user-guided passive manipulation to active social compliance.
Problem

Research questions and friction points this paper is trying to address.

Embodied AI
Active Intelligence
Social Norms
Robot Benchmarking
Spatial Grounding
Innovation

Methods, ideas, or system contributions that make the work stand out.

active intelligence
embodied AI
social norms
spatial grounding
RAG
🔎 Similar Papers
2024-07-09IEEE/ASME transactions on mechatronicsCitations: 94
K
Kuofei Fang
State Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University
X
Xinyi Che
State Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University
H
Haomin Ouyang
State Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University
S
Shufan Zhang
State Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University
Xuehao Wang
Xuehao Wang
Zhejiang University
Multi-Task LearningSegment Anything ModelPEFTLLM
Q
Qi Liu
State Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University
L
Liyi Liu
State Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University
C
Chenqi Zhang
State Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University
W
Wenxi Cai
State Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University
W
Wenyu Dai
State Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University
J
Jinyang Wu
Tsinghua University
Fan Zhang
Fan Zhang
CSE PhD Student, The Chinese University of Hong Kong (CUHK)
Large Language ModelsAI for ScienceMultimodal Learning
Haoyu Chen
Haoyu Chen
University of Oulu
Deep LearningComputer VisionHuman gesture3D GenerationEmotion AI
B
Bin He
State Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University
Zheng Lian
Zheng Lian
Associate Professor, IEEE/CCF Senior Member, Institute of Automation, Chinese Academy of Sciences
Affective ComputingSentiment AnalysisMachine Learning