🤖 AI Summary
This work addresses the puzzle of how biological visual systems achieve efficient learning from limited experience without relying on extensive labeled data. It proposes a fully hierarchical unsupervised efficient coding framework that progressively compresses natural images by exploiting their local statistical regularities, layer by layer, without requiring labels or backpropagation. The model constructs human-interpretable features—such as edges, color, texture, and shape—in a bottom-up manner. For the first time, the principle of efficient coding is consistently applied throughout an entire deep network architecture. The resulting representations exhibit strong alignment with human visual cortical responses measured via fMRI and significantly enhance both category learning efficiency and neural alignment under few-shot conditions.
📝 Abstract
Biological visual systems learn from limited experience, unlike deep learning models that rely on millions of training images. What learning principles make this possible? We tested whether efficient coding, the idea that neural representations capture the statistical structure of natural inputs, can build a hierarchy of human-aligned visual features from limited data. We developed an unsupervised learning procedure in which each layer of a deep network compresses its inputs onto the dominant modes of variation in natural images, using only local statistics and no labels, tasks, or backpropagation. This unsupervised procedure yields features that progress from edges and colors to textures and shapes. The features of this deep efficient coding model are readily recognized by human observers and are predictive of image-evoked fMRI responses in human visual cortex. Furthermore, a hybrid learning procedure that combines efficient coding with supervised fine-tuning yields better brain alignment in low-data settings and more rapid category learning. These findings suggest that efficient coding may shape representations across the entire visual hierarchy and help explain the data efficiency of biological vision.