HOCA-Bench: Beyond Semantic Perception to Predictive World Modeling via Hegelian Ontological-Causal Anomalies

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current video large language models (Video-LLMs) demonstrate strong semantic understanding but lack the capacity for predictive modeling of physical dynamics. This work introduces HOCA-Bench, a novel benchmark that, for the first time, incorporates a Hegelian philosophical framework to categorize physical anomalies into ontological and causal types. Leveraging generative video models as adversarial simulators, the authors construct a fine-grained evaluation dataset comprising 1,439 videos and 3,470 question-answer pairs. Experiments across 17 state-of-the-art Video-LLMs reveal that while models perform reasonably well in detecting ontological anomalies, their accuracy drops by over 20% on questions involving causal mechanisms—such as gravity and friction. This performance gap persists even when System-2 reasoning is explicitly invoked, underscoring a fundamental limitation in current models’ ability to reason about physical causality.

Technology Category

Application Category

📝 Abstract
Video-LLMs have improved steadily on semantic perception, but they still fall short on predictive world modeling, which is central to physically grounded intelligence. We introduce HOCA-Bench, a benchmark that frames physical anomalies through a Hegelian lens. HOCA-Bench separates anomalies into two types: ontological anomalies, where an entity violates its own definition or persistence, and causal anomalies, where interactions violate physical relations. Using state-of-the-art generative video models as adversarial simulators, we build a testbed of 1,439 videos (3,470 QA pairs). Evaluations on 17 Video-LLMs show a clear cognitive lag: models often identify static ontological violations (e.g., shape mutations) but struggle with causal mechanisms (e.g., gravity or friction), with performance dropping by more than 20% on causal tasks. System-2 "Thinking" modes improve reasoning, but they do not close the gap, suggesting that current architectures recognize visual patterns more readily than they apply basic physical laws.
Problem

Research questions and friction points this paper is trying to address.

predictive world modeling
physical anomalies
causal reasoning
Video-LLMs
ontological anomalies
Innovation

Methods, ideas, or system contributions that make the work stand out.

predictive world modeling
ontological-causal anomalies
Video-LLMs
physical reasoning
Hegelian benchmark
🔎 Similar Papers
No similar papers found.
C
Chang Liu
National University of Defense Technology, Changsha, China
Yunfan Ye
Yunfan Ye
National University of Defense Technology
Low-level VisionComputer GraphicsEdge Detection
Q
Qingyang Zhou
National University of Defense Technology, Changsha, China
X
Xichen Tan
National University of Defense Technology, Changsha, China
M
Mengxuan Luo
National University of Defense Technology, Changsha, China
Z
Zhenyu Qiu
National University of Defense Technology, Changsha, China
W
Wei Peng
National University of Defense Technology, Changsha, China
Z
Zhiping Cai
National University of Defense Technology, Changsha, China