Toddlers' Active Gaze Behavior Supports Self-Supervised Object Learning

📅 2024-11-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
How do infants develop view-invariant object representations through active eye movements, and what role does their unique oculomotor behavior play in unsupervised visual learning? Method: We collected naturalistic infant gaze data using head-mounted eye tracking, simulating their constrained foveal input and self-directed object fixation behaviors. Guided by developmental neuroscience principles, we designed a temporal self-supervised learning model—based on a SimCLR variant—that incorporates infant-specific sampling dynamics. Results: The model demonstrates that infants’ characteristic short-duration, high-attention foveal sampling strategy significantly enhances representation robustness, outperforming adult-like visual sampling strategies. Critically, it acquires viewpoint-invariant object representations entirely without labels. This work provides the first empirical evidence that infant eye movement patterns constitute a biologically optimized mechanism naturally suited to self-supervised learning—establishing a novel, developmentally grounded paradigm for brain-inspired visual representation modeling.

Technology Category

Application Category

📝 Abstract
Toddlers learn to recognize objects from different viewpoints with almost no supervision. Recent works argue that toddlers develop this ability by mapping close-in-time visual inputs to similar representations while interacting with objects. High acuity vision is only available in the central visual field, which May explain why toddlers (much like adults) constantly move around their gaze during such interactions. It is unclear whether/how much toddlers curate their visual experience through these eye movements to support their learning of object representations. In this work, we explore whether a bio-inspired visual learning model can harness toddlers' gaze behavior during a play session to develop view-invariant object recognition. Exploiting head-mounted eye tracking during dyadic play, we simulate toddlers' central visual field experience by cropping image regions centered on the gaze location. This visual stream feeds time-based self-supervised learning algorithms. Our experiments demonstrate that toddlers' gaze strategy supports the learning of invariant object representations. Our analysis also reveals that the limited size of the central visual field where acuity is high is crucial for this. We further find that toddlers' visual experience elicits more robust representations compared to adults', mostly because toddlers look at objects they hold themselves for longer bouts. Overall, our work reveals how toddlers' gaze behavior supports self-supervised learning of view-invariant object recognition.
Problem

Research questions and friction points this paper is trying to address.

Explores toddlers' gaze behavior in object learning
Simulates central visual field for view-invariant recognition
Compares toddlers' and adults' visual experience robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bio-inspired visual learning model
Head-mounted eye tracking simulation
Time-based self-supervised algorithms
🔎 Similar Papers
No similar papers found.
Zhengyang Yu
Zhengyang Yu
Frankfurt Institute for Advanced Studies (FIAS)
Image UnderstandingCognitive Science
A
A. Aubret
Frankfurt Institute for Advanced Studies, Xidian-FIAS International Joint Research Center
Marcel C. Raabe
Marcel C. Raabe
Frankfurt Institute for Advanced Studies
J
Jane Yang
Department of Psychology, University of Texas at Austin
C
Chen Yu
Department of Psychology, University of Texas at Austin
Jochen Triesch
Jochen Triesch
Frankfurt Institute for Advanced Studies
Artificial IntelligenceComputational NeuroscienceVisionDevelopmental Robotics/AI