Spatial Causal Prediction in Video

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited capacity of existing video understanding models to perform causal inference over unobserved spatiotemporal states, particularly in predicting past or future spatial causal dynamics. To this end, we introduce a novel task termed Spatial Causal Prediction (SCP) and present SCP-Bench, a benchmark comprising 2,500 question-answer pairs. We systematically evaluate 23 state-of-the-art models by integrating multi-view video analysis, causal direction modeling, perception enhancement, and reasoning-guided strategies to assess their spatiotemporal causal reasoning capabilities. Experimental results demonstrate that current models significantly underperform humans in temporal extrapolation and causal grounding, while our proposed approach effectively enhances spatial causal reasoning performance, thereby highlighting both the challenge and research value of the SCP task.

Technology Category

Application Category

📝 Abstract
Spatial reasoning, the ability to understand spatial relations, causality, and dynamic evolution, is central to human intelligence and essential for real-world applications such as autonomous driving and robotics. Existing studies, however, primarily assess models on visible spatio-temporal understanding, overlooking their ability to infer unseen past or future spatial states. In this work, we introduce Spatial Causal Prediction (SCP), a new task paradigm that challenges models to reason beyond observation and predict spatial causal outcomes. We further construct SCP-Bench, a benchmark comprising 2,500 QA pairs across 1,181 videos spanning diverse viewpoints, scenes, and causal directions, to support systematic evaluation. Through comprehensive experiments on {23} state-of-the-art models, we reveal substantial gaps between human and model performance, limited temporal extrapolation, and weak causal grounding. We further analyze key factors influencing performance and propose perception-enhancement and reasoning-guided strategies toward advancing spatial causal intelligence. The project page is https://guangstrip.github.io/SCP-Bench.
Problem

Research questions and friction points this paper is trying to address.

Spatial Causal Prediction
spatial reasoning
causal inference
video understanding
temporal extrapolation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial Causal Prediction
SCP-Bench
causal reasoning
temporal extrapolation
spatial intelligence
🔎 Similar Papers
No similar papers found.
Y
Yanguang Zhao
National University of Singapore
J
Jie Yang
National University of Singapore
Shengqiong Wu
Shengqiong Wu
National University of Singapore
Multimodal LearningVisual ModelingLarge Language ModelNatural Language Processing
S
Shutong Hu
National University of Singapore
H
Hongbo Qiu
Shenzhen University
Y
Yu Wang
Sichuan University
G
Guijia Zhang
Shenzhen University
T
Tan Kai Ze
National University of Singapore
Hao Fei
Hao Fei
National University of Singapore
Vision and LanguageLarge Language ModelNatural Language ProcessingWorld Modeling
C
Chia-Wen Lin
National Tsing Hua University
M
Mong-Li Lee
National University of Singapore
W
Wynne Hsu
National University of Singapore