🤖 AI Summary
Current AI vision systems exhibit weak shape perception, strong texture bias, poor robustness, and limited abstract recognition capability. To address these limitations, this paper proposes the Developmental Visual Diet (DVD), a novel training paradigm inspired by human visual development. DVD formalizes the progression from infant to adult vision into a quantifiable, staged curriculum: early stages emphasize shape priors, while complexity—such as textured surfaces and cluttered backgrounds—is incrementally introduced. Crucially, DVD requires no model scaling; instead, it leverages curriculum-based data scheduling and psychophysics- and neurophysiology-informed training strategies. Empirical results demonstrate substantial improvements in shape reliance, abstract pattern recognition accuracy, and robustness against geometric distortions and adversarial attacks—achieving state-of-the-art performance even with reduced training data and surpassing larger models. The core contribution lies in rigorously translating developmental cognitive principles into a computationally executable visual learning curriculum, advancing AI vision toward human-like perceptual cognition.
📝 Abstract
Despite years of research and the dramatic scaling of artificial intelligence (AI) systems, a striking misalignment between artificial and human vision persists. Contrary to humans, AI heavily relies on texture-features rather than shape information, lacks robustness to image distortions, remains highly vulnerable to adversarial attacks, and struggles to recognise simple abstract shapes within complex backgrounds. To close this gap, we here introduce a solution that arises from a previously underexplored direction: rather than scaling up, we take inspiration from how human vision develops from early infancy into adulthood. We quantified the visual maturation by synthesising decades of psychophysical and neurophysiological research into a novel developmental visual diet (DVD) for AI vision. We show that guiding AI systems through this human-inspired curriculum produces models that closely align with human behaviour on every hallmark of robust vision tested yielding the strongest reported reliance on shape information to date, abstract shape recognition beyond the state of the art, higher robustness to image corruptions, and stronger resilience to adversarial attacks. By outperforming high parameter AI foundation models trained on orders of magnitude more data, we provide evidence that robust AI vision can be achieved by guiding the way how a model learns, not merely how much it learns, offering a resource-efficient route toward safer and more human-like artificial visual systems.