VACT: A Video Automatic Causal Testing System and a Benchmark

📅 2025-03-08

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Text-to-video generation models (VGMs) exhibit significant deficiencies in factual accuracy and physical causal understanding, compounded by the absence of a universal, automated framework for causal reasoning evaluation. Method: We propose the first fully automated video causal testing framework that requires no human annotation and enables cross-scenario, large-scale assessment of causal plausibility. Our approach integrates video semantic parsing, multi-granularity causal consistency checking, causal graph modeling, counterfactual reasoning, and large language model–assisted verification. Contribution/Results: We design a hierarchical set of causal evaluation metrics assessing force dynamics, motion physics, and spatiotemporal continuity, enabling systematic benchmarking of leading VGMs. Our framework quantitatively reveals pervasive physical causal failures across models—constituting the first such empirical characterization. It establishes a novel, interpretable paradigm for evaluating and improving causal alignment in VGMs, accompanied by a reproducible benchmark.

Technology Category

Application Category

📝 Abstract

With the rapid advancement of text-conditioned Video Generation Models (VGMs), the quality of generated videos has significantly improved, bringing these models closer to functioning as ``*world simulators*'' and making real-world-level video generation more accessible and cost-effective. However, the generated videos often contain factual inaccuracies and lack understanding of fundamental physical laws. While some previous studies have highlighted this issue in limited domains through manual analysis, a comprehensive solution has not yet been established, primarily due to the absence of a generalized, automated approach for modeling and assessing the causal reasoning of these models across diverse scenarios. To address this gap, we propose VACT: an **automated** framework for modeling, evaluating, and measuring the causal understanding of VGMs in real-world scenarios. By combining causal analysis techniques with a carefully designed large language model assistant, our system can assess the causal behavior of models in various contexts without human annotation, which offers strong generalization and scalability. Additionally, we introduce multi-level causal evaluation metrics to provide a detailed analysis of the causal performance of VGMs. As a demonstration, we use our framework to benchmark several prevailing VGMs, offering insight into their causal reasoning capabilities. Our work lays the foundation for systematically addressing the causal understanding deficiencies in VGMs and contributes to advancing their reliability and real-world applicability.

Problem

Research questions and friction points this paper is trying to address.

Automated assessment of causal reasoning in video generation models.

Addressing factual inaccuracies and physical law understanding in generated videos.

Developing scalable metrics for evaluating causal performance in diverse scenarios.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated framework for causal understanding assessment

Combines causal analysis with large language models

Introduces multi-level causal evaluation metrics

🔎 Similar Papers

NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative