Zero-shot World Models Are Developmentally Efficient Learners

๐Ÿ“… 2026-04-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

227K/year
๐Ÿค– AI Summary
This work addresses the poor data efficiency and limited generalization of current AI systems in physical scene understanding by drawing inspiration from childrenโ€™s remarkable ability to rapidly acquire intuitive physics from minimal experience. The authors propose the Zero-shot World Model (ZWM), which formalizes developmental cognitive mechanisms into a computable framework. ZWM achieves efficient learning from a single childโ€™s first-person experiences through three key components: disentangled appearance and dynamics via sparse temporal prediction, zero-shot estimation grounded in approximate causal inference, and modular compositional reasoning. Evaluated across multiple physical reasoning benchmarks, ZWM rapidly attains high performance, successfully replicates characteristic child-like behaviors, and generates internal representations that align with neural activity patterns observed in the human brain, thereby substantially enhancing few-shot generalization capabilities.

Technology Category

Application Category

๐Ÿ“ Abstract
Young children demonstrate early abilities to understand their physical world, estimating depth, motion, object coherence, interactions, and many other aspects of physical scene understanding. Children are both data-efficient and flexible cognitive systems, creating competence despite extremely limited training data, while generalizing to myriad untrained tasks -- a major challenge even for today's best AI systems. Here we introduce a novel computational hypothesis for these abilities, the Zero-shot Visual World Model (ZWM). ZWM is based on three principles: a sparse temporally-factored predictor that decouples appearance from dynamics; zero-shot estimation through approximate causal inference; and composition of inferences to build more complex abilities. We show that ZWM can be learned from the first-person experience of a single child, rapidly generating competence across multiple physical understanding benchmarks. It also broadly recapitulates behavioral signatures of child development and builds brain-like internal representations. Our work presents a blueprint for efficient and flexible learning from human-scale data, advancing both a computational account for children's early physical understanding and a path toward data-efficient AI systems.
Problem

Research questions and friction points this paper is trying to address.

physical scene understanding
data-efficient learning
cognitive development
zero-shot generalization
artificial intelligence
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot learning
visual world model
causal inference
compositional reasoning
developmental AI