Impossible Videos

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work addresses the critical deficiency of video generation and understanding models in handling counterfactual (impossible) scenarios. To this end, we introduce IPV-Bench—the first benchmark for counterfactual video evaluation—covering violations of physical, biological, geographical, and social laws. We propose the first systematic taxonomy for impossible video classification and a dual-task (generation + understanding) evaluation framework integrating prompt adherence, creative generation, and spatiotemporal causal reasoning. Leveraging a multidimensional impossibility taxonomy, we design prompt suites and curate video data; our evaluation protocol incorporates world-knowledge modeling, temporal dynamics analysis, and counterfactual inference. Experiments expose fundamental limitations: weak prompt adherence in current video generative models and severe deficits in spatiotemporal causal reasoning among Video-LLMs. These findings provide concrete, actionable directions for advancing embodied, cognition-aware video models.

Technology Category

Application Category

📝 Abstract

Synthetic videos nowadays is widely used to complement data scarcity and diversity of real-world videos. Current synthetic datasets primarily replicate real-world scenarios, leaving impossible, counterfactual and anti-reality video concepts underexplored. This work aims to answer two questions: 1) Can today's video generation models effectively follow prompts to create impossible video content? 2) Are today's video understanding models good enough for understanding impossible videos? To this end, we introduce IPV-Bench, a novel benchmark designed to evaluate and foster progress in video understanding and generation. IPV-Bench is underpinned by a comprehensive taxonomy, encompassing 4 domains, 14 categories. It features diverse scenes that defy physical, biological, geographical, or social laws. Based on the taxonomy, a prompt suite is constructed to evaluate video generation models, challenging their prompt following and creativity capabilities. In addition, a video benchmark is curated to assess Video-LLMs on their ability of understanding impossible videos, which particularly requires reasoning on temporal dynamics and world knowledge. Comprehensive evaluations reveal limitations and insights for future directions of video models, paving the way for next-generation video models.

Problem

Research questions and friction points this paper is trying to address.

Evaluate video generation models for impossible content creation

Assess video understanding models on impossible video comprehension

Develop IPV-Bench to benchmark video model capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

IPV-Bench evaluates impossible video generation.

Taxonomy-based prompt suite tests model creativity.

Video-LLMs assessed on understanding impossible videos.

🔎 Similar Papers

What Matters in Detecting AI-Generated Videos like Sora?

2024-06-27arXiv.orgCitations: 12

TikTok

San Jose, California

Sr. Research Engineer/Scientist (all levels), World Models

TikTok

San Jose, California

AI Research Scientist, Video Generation and Post Training, FAIR