🤖 AI Summary
Existing computational auditory models fail to incorporate the topographic organization of the human auditory cortex—such as smooth spatial mappings of frequency tuning and speech/music-selective functional modules.
Method: We propose the first end-to-end topographically organized auditory model, introducing cortical wiring length as a novel regularizer to induce brain-like topographic self-organization in deep neural networks. The model takes cochleotopic input and jointly optimizes classification performance and topographic consistency loss, enforcing neighboring units to exhibit similar acoustic tuning properties.
Contribution/Results: The model achieves state-of-the-art accuracy on speech and environmental sound classification benchmarks. Crucially, it accurately predicts fMRI responses and generates physiologically plausible topographic maps—including frequency and amplitude modulation gradients—as well as spatially segregated speech- and music-selective functional modules. This establishes a new paradigm for biologically interpretable auditory modeling grounded in cortical topology.
📝 Abstract
The human auditory cortex is topographically organized. Neurons with similar response properties are spatially clustered, forming smooth maps for acoustic features such as frequency in early auditory areas, and modular regions selective for music and speech in higher-order cortex. Yet, evaluations for current computational models of auditory perception do not measure whether such topographic structure is present in a candidate model. Here, we show that cortical topography is not present in the previous best-performing models at predicting human auditory fMRI responses. To encourage the emergence of topographic organization, we adapt a cortical wiring-constraint loss originally designed for visual perception. The new class of topographic auditory models, TopoAudio, are trained to classify speech, and environmental sounds from cochleagram inputs, with an added constraint that nearby units on a 2D cortical sheet develop similar tuning. Despite these additional constraints, TopoAudio achieves high accuracy on benchmark tasks comparable to the unconstrained non-topographic baseline models. Further, TopoAudio predicts the fMRI responses in the brain as well as standard models, but unlike standard models, TopoAudio develops smooth, topographic maps for tonotopy and amplitude modulation (common properties of early auditory representation, as well as clustered response modules for music and speech (higher-order selectivity observed in the human auditory cortex). TopoAudio is the first end-to-end biologically grounded auditory model to exhibit emergent topography, and our results emphasize that a wiring-length constraint can serve as a general-purpose regularization tool to achieve biologically aligned representations.