🤖 AI Summary
This work investigates whether autonomous agents can acquire human-like physical and causal reasoning capabilities solely through environmental interaction. Method: We propose an interactive physical reasoning framework featuring PhysCode—a physics-aware action encoding space that unifies semantic intent with dynamical behavior—and integrate a vision-language model (VLM) policy, world-model-based forward rollout prediction, and policy-gradient reinforcement learning. The framework undergoes large-scale pretraining across 1,000+ heterogeneous games. Contribution/Results: Evaluated across survival, curiosity-driven, and utility-oriented benchmarks, the model demonstrates robust performance on diverse human-like physical reasoning tasks—matching GPT-5’s overall capability while significantly outperforming it on curiosity-driven tasks. Performance consistently improves with increasing interaction steps and game complexity, and the model exhibits strong zero-shot transferability to unseen environments.
📝 Abstract
Humans learn by observing, interacting with environments, and internalizing physics and causality. Here, we aim to ask whether an agent can similarly acquire human-like reasoning from interaction and keep improving with more experience. We study this in a Game-to-Unseen (G2U) setting, curating 1,000+ heterogeneous games with diverse physical and causal mechanisms, and evaluate at three human-like levels: Survival, Curiosity, Utility, from primitive intuition to goal-driven reasoning. Our analysis reveals complementary failures: VLM/VLA agents reason but lack look-ahead in interactive settings, while world models imagine but imitate visual patterns rather than analyze physics and causality. We therefore propose IPR (Interactive Physical Reasoner), using world-model rollouts to score and reinforce a VLM's policy, and introduce PhysCode, a physics-centric action code aligning semantic intent with dynamics to provide a shared action space for prediction and reasoning. Pretrained on 1,000+ games, our IPR performs robustly on three levels, matches GPT-5 overall, and surpasses it on Curiosity. We find that performance improves with more training games and interaction steps, and that the model also zero-shot transfers to unseen games. These results support physics-centric interaction as a path to steadily improving physical reasoning.