Grounding Social Perception in Intuitive Physics

📅 2026-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
How do humans infer social intentions from behavior while accounting for constraints imposed by the physical environment? This work proposes that social perception arises from a joint inference process integrating intuitive psychology and intuitive physics. To test this hypothesis, the authors introduce the PHASE dataset and the SIMPLE computational model, which uniquely combines physical simulation with Bayesian inverse planning. SIMPLE leverages probabilistic planning and procedural animation generation to accurately replicate human judgments of physically constrained social interactions across diverse scenarios. The model significantly outperforms baseline approaches that ignore physical constraints as well as current vision-language models, demonstrating that the integration of physical and social reasoning is essential for understanding human behavior.
📝 Abstract
People infer rich social information from others' actions. These inferences are often constrained by the physical world: what agents can do, what obstacles permit, and how the physical actions of agents causally change an environment and other agents' mental states and behavior. We propose that such rich social perception is more than visual pattern matching, but rather a reasoning process grounded in an integration of intuitive psychology with intuitive physics. To test this hypothesis, we introduced PHASE (PHysically grounded Abstract Social Events), a large dataset of procedurally generated animations, depicting physically simulated two-agent interactions on a 2D surface. Each animation follows the style of the Heider and Simmel movie, with systematic variation in environment geometry, object dynamics, agent capacities, goals, and relationships (friendly/adversarial/neutral). We then present a computational model, SIMPLE, a physics-grounded Bayesian inverse planning model that integrates planning, probabilistic planning, and physics simulation to infer agents' goals and relations from their trajectories. Our experimental results showed that SIMPLE achieved high accuracy and agreement with human judgments across diverse scenarios, while feedforward baseline models -- including strong vision-language models -- and physics-agnostic inverse planning failed to achieve human-level performance and did not align with human judgments. These results suggest that our model provides a computational account for how people understand physically grounded social scenes by inverting a generative model of physics and agents.
Problem

Research questions and friction points this paper is trying to address.

social perception
intuitive physics
goal inference
agent interaction
physical grounding
Innovation

Methods, ideas, or system contributions that make the work stand out.

intuitive physics
social perception
inverse planning
physics simulation
Bayesian modeling
🔎 Similar Papers
No similar papers found.