HOCA-Bench: Beyond Semantic Perception to Predictive World Modeling via Hegelian Ontological-Causal Anomalies

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Current video large language models (Video-LLMs) demonstrate strong semantic understanding but lack the capacity for predictive modeling of physical dynamics. This work introduces HOCA-Bench, a novel benchmark that, for the first time, incorporates a Hegelian philosophical framework to categorize physical anomalies into ontological and causal types. Leveraging generative video models as adversarial simulators, the authors construct a fine-grained evaluation dataset comprising 1,439 videos and 3,470 question-answer pairs. Experiments across 17 state-of-the-art Video-LLMs reveal that while models perform reasonably well in detecting ontological anomalies, their accuracy drops by over 20% on questions involving causal mechanisms—such as gravity and friction. This performance gap persists even when System-2 reasoning is explicitly invoked, underscoring a fundamental limitation in current models’ ability to reason about physical causality.

Technology Category

Application Category

📝 Abstract

Video-LLMs have improved steadily on semantic perception, but they still fall short on predictive world modeling, which is central to physically grounded intelligence. We introduce HOCA-Bench, a benchmark that frames physical anomalies through a Hegelian lens. HOCA-Bench separates anomalies into two types: ontological anomalies, where an entity violates its own definition or persistence, and causal anomalies, where interactions violate physical relations. Using state-of-the-art generative video models as adversarial simulators, we build a testbed of 1,439 videos (3,470 QA pairs). Evaluations on 17 Video-LLMs show a clear cognitive lag: models often identify static ontological violations (e.g., shape mutations) but struggle with causal mechanisms (e.g., gravity or friction), with performance dropping by more than 20% on causal tasks. System-2 "Thinking" modes improve reasoning, but they do not close the gap, suggesting that current architectures recognize visual patterns more readily than they apply basic physical laws.

Problem

Research questions and friction points this paper is trying to address.

predictive world modeling

physical anomalies

causal reasoning

Video-LLMs

ontological anomalies

Innovation

Methods, ideas, or system contributions that make the work stand out.

predictive world modeling

ontological-causal anomalies

Video-LLMs