Foveated Retinotopy Improves Classification and Localization in CNNs

📅 2024-02-23

📈 Citations: 1

✨ Influential: 0

career value

219K/year

🤖 AI Summary

To address the limited robustness of image recognition and localization under scale, rotation, and spatial transformations, this paper proposes a biologically inspired foveated retinotopic input mechanism—motivated by the primate fovea—that differentiably embeds a dynamic, fixation-dependent retinal map (high-resolution fovea, low-resolution periphery) into the CNN input layer for the first time. Without modifying the backbone architecture (e.g., ResNet), this input transformation implicitly encodes object geometry and enables multi-fixation inference. Experiments show that classification accuracy is preserved while robustness to scale and rotation perturbations is significantly enhanced. Crucially, high-precision weakly supervised localization is achieved solely from multi-location classification responses—eliminating the need for dedicated detection heads or bounding-box annotations. This establishes a novel paradigm for unsupervised and weakly supervised visual understanding.

Technology Category

Application Category

📝 Abstract

From a falcon detecting prey to humans recognizing faces, many species exhibit extraordinary abilities in rapid visual localization and classification. These are made possible by a specialized retinal region called the fovea, which provides high acuity at the center of vision while maintaining lower resolution in the periphery. This distinctive spatial organization, preserved along the early visual pathway through retinotopic mapping, is fundamental to biological vision, yet remains largely unexplored in machine learning. Our study investigates how incorporating foveated retinotopy may benefit deep convolutional neural networks (CNNs) in image classification tasks. By implementing a foveated retinotopic transformation in the input layer of standard ResNet models and re-training them, we maintain comparable classification accuracy while enhancing the network's robustness to scale and rotational perturbations. Although this architectural modification introduces increased sensitivity to fixation point shifts, we demonstrate how this apparent limitation becomes advantageous: variations in classification probabilities across different gaze positions serve as effective indicators for object localization. Our findings suggest that foveated retinotopic mapping encodes implicit knowledge about visual object geometry, offering an efficient solution to the visual search problem - a capability crucial for many living species.

Problem

Research questions and friction points this paper is trying to address.

Biological Foveation

Computer Vision

Robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Focal Retinotopy

Robust Object Recognition

Position-sensitive Feature Extraction

🔎 Similar Papers

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers