🤖 AI Summary
This study investigates the perceptual representational mechanisms underlying human face recognition and delineates theoretical distinctions among computational models of face perception. The authors constructed six deep neural networks with identical architectures but differing training objectives—including inverse rendering, face recognition, and object classification—and trained them on both natural and synthetic images. For the first time, they introduced optimization-generated “controversial” face pairs and validated model predictions against psychophysical data from 864 human participants. The results demonstrate that models emphasizing high-level invariant structure and trained on natural images best align with human judgments, revealing that human face perception is jointly shaped by the statistical regularities of natural images and causal inference about underlying generative factors.
📝 Abstract
The perceptual representations supporting our ability to recognize faces remain a computational mystery. Deep neural networks offer mechanistic hypotheses for human face perception, but theoretically distinct models often make indistinguishable representational predictions for randomly sampled faces. To expose diagnostic differences among these hypotheses, we compared six neural network models sharing an architecture but trained on distinct tasks, using face pairs optimized to elicit contrasting model predictions ("controversial" pairs) alongside randomly sampled pairs. We tested model predictions against face-dissimilarity judgments from 864 human participants across stimulus sets differing in realism and pose variation. Models prioritizing high-level, invariant structures (trained via inverse rendering, face identification, or object classification) most robustly matched human judgments. Furthermore, models trained on natural images typically outperformed synthetic-trained counterparts. Together, these findings suggest that human face perception is shaped by mechanisms that infer latent causes of facial appearance, discount nuisance variation, and are tuned by natural image statistics.