🤖 AI Summary
This study investigates the alignment gap between human and artificial visual perception by systematically comparing their responses to classical visual illusions, aiming to uncover AI-specific perceptual vulnerabilities. Method: We evaluate mainstream CNNs and vision Transformers on a standardized illusion image dataset, employing behavioral response analysis, feature visualization, and adversarial sensitivity assessment. Contribution/Results: (1) While AI models reproduce certain illusion effects observed in humans, they do so via low-level pixel statistics rather than high-level semantic inference; (2) AI exhibits unique pixel-level hallucinations and hypersensitivity absent in human perception; (3) These discrepancies stem fundamentally from AI’s lack of human-like contextual modeling and structured prior constraints. Our findings expose critical misalignments in current visual systems’ perceptual mechanisms, providing both theoretical insights and actionable directions—such as integrating hierarchical priors and context-aware inference—for developing more robust, interpretable, and human-aligned vision models.
📝 Abstract
By comparing biological and artificial perception through the lens of illusions, we highlight critical differences in how each system constructs visual reality. Understanding these divergences can inform the development of more robust, interpretable, and human-aligned artificial intelligence (AI) vision systems. In particular, visual illusions expose how human perception is based on contextual assumptions rather than raw sensory data. As artificial vision systems increasingly perform human-like tasks, it is important to ask: does AI experience illusions, too? Does it have unique illusions? This article explores how AI responds to classic visual illusions that involve color, size, shape, and motion. We find that some illusion-like effects can emerge in these models, either through targeted training or as by-products of pattern recognition. In contrast, we also identify illusions unique to AI, such as pixel-level sensitivity and hallucinations, that lack human counterparts. By systematically comparing human and AI responses to visual illusions, we uncover alignment gaps and AI-specific perceptual vulnerabilities invisible to human perception. These findings provide insights for future research on vision systems that preserve human-beneficial perceptual biases while avoiding distortions that undermine trust and safety.