FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the lack of high-fidelity human-agent interaction simulation in complex social scenarios for embodied intelligence. To this end, we propose FreeAskWorld: (1) a novel closed-loop simulation paradigm that treats “interaction” as a new information modality, extending vision-language navigation (VLN) to socially grounded navigation—supporting active questioning and directional consultation; (2) a modular human-agent interaction data generation pipeline integrating large language models, intent theory, and social cognition models; and (3) a large-scale benchmark dataset comprising 17 hours of high-quality interactive trajectories. Experiments demonstrate that models fine-tuned on this data achieve significant improvements in semantic understanding, intent inference, and socially aware decision-making. FreeAskWorld establishes a scalable simulation infrastructure for socially grounded behavior planning and semantically rich interaction in embodied AI.

Technology Category

Application Category

📝 Abstract
As embodied intelligence emerges as a core frontier in artificial intelligence research, simulation platforms must evolve beyond low-level physical interactions to capture complex, human-centered social behaviors. We introduce FreeAskWorld, an interactive simulation framework that integrates large language models (LLMs) for high-level behavior planning and semantically grounded interaction, informed by theories of intention and social cognition. Our framework supports scalable, realistic human-agent simulations and includes a modular data generation pipeline tailored for diverse embodied tasks.To validate the framework, we extend the classic Vision-and-Language Navigation (VLN) task into a interaction enriched Direction Inquiry setting, wherein agents can actively seek and interpret navigational guidance. We present and publicly release FreeAskWorld, a large-scale benchmark dataset comprising reconstructed environments, six diverse task types, 16 core object categories, 63,429 annotated sample frames, and more than 17 hours of interaction data to support training and evaluation of embodied AI systems. We benchmark VLN models, and human participants under both open-loop and closed-loop settings. Experimental results demonstrate that models fine-tuned on FreeAskWorld outperform their original counterparts, achieving enhanced semantic understanding and interaction competency. These findings underscore the efficacy of socially grounded simulation frameworks in advancing embodied AI systems toward sophisticated high-level planning and more naturalistic human-agent interaction. Importantly, our work underscores that interaction itself serves as an additional information modality.
Problem

Research questions and friction points this paper is trying to address.

Simulating complex human-centered social behaviors for embodied AI systems
Developing interactive frameworks for high-level behavior planning using LLMs
Creating scalable benchmarks for training and evaluating human-agent interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LLMs for high-level behavior planning
Uses intention and social cognition theories
Provides modular data generation for embodied tasks
🔎 Similar Papers
No similar papers found.
Y
Yuhang Peng
Institute for AI Industry Research, Tsinghua University
Y
Yizhou Pan
Institute for AI Industry Research, Tsinghua University
Xinning He
Xinning He
Tsinghua University
Human-Robot InteractionHCI
J
Jihaoyu Yang
Institute for AI Industry Research, Tsinghua University
X
Xinyu Yin
Institute for AI Industry Research, Tsinghua University
H
Han Wang
Institute for AI Industry Research, Tsinghua University
X
Xiaoji Zheng
Institute for AI Industry Research, Tsinghua University
C
Chao Gao
Institute for AI Industry Research, Tsinghua University
Jiangtao Gong
Jiangtao Gong
Institute for AI Industry Research (AIR), Tsinghua University
Human-Computer InteractionHuman-AI CollaborationRoboticsMixed Reality