TongSIM: A General Platform for Simulating Intelligent Machines

📅 2025-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing embodied AI simulation platforms are highly task-specific and exhibit poor generalization, hindering unified investigation spanning low-level navigation to high-level social interaction and human-AI collaboration. To address this, we propose the first high-fidelity, general-purpose embodied AI simulation platform, encompassing over 100 multi-room indoor scenes and open outdoor urban environments. Our approach introduces three core innovations: (1) task-adaptive fidelity control, (2) dynamic environment evolution, and (3) a cross-level unified evaluation framework—integrated with multimodal perception modeling, physics-based interaction, programmable scene generation, heterogeneous agent co-simulation, and a multidimensional capability benchmark. Experiments demonstrate significant improvements in training efficiency and generalization across key competencies—including spatial reasoning, social cognition, and human-AI collaboration—thereby advancing embodied AI research from task-specific paradigms toward general-purpose foundations.

Technology Category

Application Category

📝 Abstract
As artificial intelligence (AI) rapidly advances, especially in multimodal large language models (MLLMs), research focus is shifting from single-modality text processing to the more complex domains of multimodal and embodied AI. Embodied intelligence focuses on training agents within realistic simulated environments, leveraging physical interaction and action feedback rather than conventionally labeled datasets. Yet, most existing simulation platforms remain narrowly designed, each tailored to specific tasks. A versatile, general-purpose training environment that can support everything from low-level embodied navigation to high-level composite activities, such as multi-agent social simulation and human-AI collaboration, remains largely unavailable. To bridge this gap, we introduce TongSIM, a high-fidelity, general-purpose platform for training and evaluating embodied agents. TongSIM offers practical advantages by providing over 100 diverse, multi-room indoor scenarios as well as an open-ended, interaction-rich outdoor town simulation, ensuring broad applicability across research needs. Its comprehensive evaluation framework and benchmarks enable precise assessment of agent capabilities, such as perception, cognition, decision-making, human-robot cooperation, and spatial and social reasoning. With features like customized scenes, task-adaptive fidelity, diverse agent types, and dynamic environmental simulation, TongSIM delivers flexibility and scalability for researchers, serving as a unified platform that accelerates training, evaluation, and advancement toward general embodied intelligence.
Problem

Research questions and friction points this paper is trying to address.

Develops a versatile platform for training embodied AI agents
Addresses the lack of general-purpose simulation environments for multimodal AI
Enables evaluation of agent capabilities like perception and social reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

General-purpose platform for embodied agent training
High-fidelity indoor and outdoor simulation environments
Comprehensive evaluation framework with diverse benchmarks
🔎 Similar Papers
No similar papers found.
Z
Zhe Sun
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
K
Kunlun Wu
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
C
Chuanjian Fu
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
Z
Zeming Song
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
L
Langyong Shi
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
Z
Zihe Xue
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
B
Bohan Jing
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
Y
Ying Yang
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
X
Xiaomeng Gao
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
A
Aijia Li
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
T
Tianyu Guo
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
H
Huiying Li
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
X
Xueyuan Yang
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
R
Rongkai Liu
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
Xinyi He
Xinyi He
Xi'an Jiaotong University
Data analyticsNatural Language Processing
Yuxi Wang
Yuxi Wang
Ocean University of China
Computer Vision
Y
Yue Li
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
M
Mingyuan Liu
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
Yujie Lu
Yujie Lu
Research Scientist, Meta Superintelligence Lab
Vision and Language ModelLarge Language ModelLanguage Grounding
H
Hongzhao Xie
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
S
Shiyun Zhao
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
B
Bo Dai
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
W
Wei Wang
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
Tao Yuan
Tao Yuan
University of California, Los Angeles
Computer VisionArtificial Intelligence
S
Song-Chun Zhu
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China