IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

This study investigates deep learning models’ capacity to reason about intuitive physical principles—object permanence, constancy, spatiotemporal continuity, and solidity—as observed in human infants. Method: We introduce IntPhys 2, the first synthetic video benchmark grounded in developmental psychology theory, systematically violating these four principles across diverse virtual scenes to evaluate impossible-event detection. Leveraging the violation-of-expectation paradigm, we conduct zero-shot generalization assessments across multiple models and rigorously compare model performance against human behavioral data. Contribution/Results: State-of-the-art models achieve only ~50% accuracy on most tasks—far below near-perfect human performance (~100%)—quantitatively exposing a fundamental deficit in intuitive physical reasoning within complex synthetic environments for the first time. IntPhys 2 establishes a novel, cognitively inspired evaluation framework and benchmark for grounding physical commonsense in AI systems.

Technology Category

Application Category

📝 Abstract

We present IntPhys 2, a video benchmark designed to evaluate the intuitive physics understanding of deep learning models. Building on the original IntPhys benchmark, IntPhys 2 focuses on four core principles related to macroscopic objects: Permanence, Immutability, Spatio-Temporal Continuity, and Solidity. These conditions are inspired by research into intuitive physical understanding emerging during early childhood. IntPhys 2 offers a comprehensive suite of tests, based on the violation of expectation framework, that challenge models to differentiate between possible and impossible events within controlled and diverse virtual environments. Alongside the benchmark, we provide performance evaluations of several state-of-the-art models. Our findings indicate that while these models demonstrate basic visual understanding, they face significant challenges in grasping intuitive physics across the four principles in complex scenes, with most models performing at chance levels (50%), in stark contrast to human performance, which achieves near-perfect accuracy. This underscores the gap between current models and human-like intuitive physics understanding, highlighting the need for advancements in model architectures and training methodologies.

Problem

Research questions and friction points this paper is trying to address.

Evaluating deep learning models' intuitive physics understanding in complex synthetic environments

Testing models on four core principles: Permanence, Immutability, Continuity, Solidity

Highlighting performance gap between models and human-like intuitive physics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Video benchmark for intuitive physics evaluation

Tests based on violation of expectation framework

Evaluates models in diverse virtual environments

🔎 Similar Papers

Probing Mechanical Reasoning in Large Vision Language Models