Spatial4D-Bench: A Versatile 4D Spatial Intelligence Benchmark

📅 2025-12-31
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work evaluates whether multimodal large language models (MLLMs) approach human-level performance in four-dimensional spatial intelligence—the ability to perceive and reason about objects’ motion and transformation over time. To this end, we introduce the first large-scale, structured benchmark for 4D spatial intelligence, encompassing six cognitive categories, 18 distinct tasks, and approximately 40,000 question-answer pairs, thereby overcoming the limitations of existing benchmarks in scale and diversity. Systematic evaluation across multiple open- and closed-source MLLMs reveals significant deficiencies in tasks such as path planning, action recognition, and reasoning about physical plausibility, highlighting a pronounced gap between current models and human spatial cognition capabilities.

Technology Category

Application Category

📝 Abstract
4D spatial intelligence involves perceiving and processing how objects move or change over time. Humans naturally possess 4D spatial intelligence, supporting a broad spectrum of spatial reasoning abilities. To what extent can Multimodal Large Language Models (MLLMs) achieve human-level 4D spatial intelligence? In this work, we present Spatial4D-Bench, a versatile 4D spatial intelligence benchmark designed to comprehensively assess the 4D spatial reasoning abilities of MLLMs. Unlike existing spatial intelligence benchmarks that are often small-scale or limited in diversity, Spatial4D-Bench provides a large-scale, multi-task evaluation benchmark consisting of ~40,000 question-answer pairs covering 18 well-defined tasks. We systematically organize these tasks into six cognitive categories: object understanding, scene understanding, spatial relationship understanding, spatiotemporal relationship understanding, spatial reasoning and spatiotemporal reasoning. Spatial4D-Bench thereby offers a structured and comprehensive benchmark for evaluating the spatial cognition abilities of MLLMs, covering a broad spectrum of tasks that parallel the versatility of human spatial intelligence. We benchmark various state-of-the-art open-source and proprietary MLLMs on Spatial4D-Bench and reveal their substantial limitations in a wide variety of 4D spatial reasoning aspects, such as route plan, action recognition, and physical plausibility reasoning. We hope that the findings provided in this work offer valuable insights to the community and that our benchmark can facilitate the development of more capable MLLMs toward human-level 4D spatial intelligence. More resources can be found on our project page.
Problem

Research questions and friction points this paper is trying to address.

4D spatial intelligence
Multimodal Large Language Models
spatial reasoning
benchmark
spatiotemporal reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

4D spatial intelligence
multimodal large language models
spatiotemporal reasoning
benchmark
spatial cognition
🔎 Similar Papers
No similar papers found.
P
Pan Wang
Huawei Technologies
Y
Yang Liu
Huawei Technologies
Guile Wu
Guile Wu
Huawei Technologies Canada Co., Ltd.
Deep Learning3D ReconstructionGenerative AIAutonomous DrivingVisual Recognition
Eduardo R. Corral-Soto
Eduardo R. Corral-Soto
Sr. Research Scientist
Computer VisionMachine/Deep LearningSignal ProcessingApplied Mathematics
C
Chengjie Huang
Huawei Technologies
Binbin Xu
Binbin Xu
HUAWEI Noah's Ark Lab
SLAMRoboticsComputer Vision
Dongfeng Bai
Dongfeng Bai
Huawei Technologies Co., Ltd.
Computer VisionAutonomous DrivingNeural Rendering
X
Xu Yan
Huawei Technologies
Yuan Ren
Yuan Ren
Huawei noah's ark lab Canada
LiDAR perceptionSLAMastrodynamics
X
Xingxin Chen
Huawei Technologies
Y
Yizhe Wu
Huawei Technologies
T
Tao Huang
Huawei Technologies
W
Wenjun Wan
Huawei Technologies
X
Xin Wu
Huawei Technologies
P
Pei Zhou
Huawei Technologies
X
Xuyang Dai
Huawei Technologies
K
Kangbo Lv
Huawei Technologies, Tsinghua University
H
Hongbo Zhang
Huawei Technologies
Y
Yosef Fried
Huawei Technologies
A
Ai-ping Ye
Huawei Technologies
B
Bailan Feng
Huawei Technologies
Z
Zhenyu Chen
CUHK-Shenzhen
Z
Zhen Li
HKUST-GZ
Y
Yingcong Chen
Zhejiang University
Yiyi Liao
Yiyi Liao
Zhejiang University
computer visionrobotics
Bingbing Liu
Bingbing Liu
Researcher, Huawei
Autonomous DrivingRoboticsNeural RenderingVision Foundation Model